THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

(yotTNDBiD nt H. a. carvisb) 

The Official Journal of the Institute op 
Mathematical Statistics 


VOLUME XIX 


Ifl'lS 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


^11 

hUH 

M S. BAETLETT 
WILLIAM G COCHRAN 
ALLEN T. CEAIG 
C. C CRAIG 


EDITED BY 
S. S. WILKS, Editor 

HARALD CRAMfiR J. NEYMAN 

W. EDWARDS DEMINQ WALTER A. SIIEWHART 

J L. DOOB JOHN W. TUKEY 

W. FELLER A. WALD 

HAROLD HOTELLING 


T. W. Andbbson, Jr. 
David Blaokwbld 
J. H. CurtisB 
J. F.Dalt 
Harodd F. Dodge 
Paul S. Dwtbh 


WITH THE COOPEHATION OF 

Churohill Eibbnhabt 
M. A. Gibbhiok 
PaulR Halmob 
Paul G. Hoel 
Mark Kao 
E. L. Lehmann 
William G. Madow 


H. B Mann 
Alexander M. Mood 
Fredbhiok Mostbllbb 
11. E. Robbinb 
Hbnry SchkbfA 
Jacob Wolbowitz 


The Annals of Mathematical Statistics is published Ruarterly by the 
Institute of Mathematical Statistics, Mt. Royal (& Guilford Aves., Baltimore 2, 
Md. Subscriptions, renewals, orders for back numbers and other business com¬ 
munications should be sent to the Annals of Mathematical Statistics, Mt. 
Royal & Guilford Aves, Baltimore 2, Md., or to the Secretary of the Insti¬ 
tute of Mathematical Statistics, P. S Dwyer, 116 Rackham Hall, University of 
Michigan, Ann Arbor, Mich. 

I 

Changes in maihag address which are to become effective for a given 
issue should be reported to the Secretary on or before the 16th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. 

Manuscripts for publication in the Annals of Mathematical Statistics 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot¬ 
notes should be avoided. Figures, charts, and diagrams should bo drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 

Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 

The subscription price for the Annals is $8.00 inside the Western Ileml- 

Single copies $3.00. Back numbers are available 
at $8.00 per volume or $3.00 per single issue. 

Composed and Printed at the 
WAVERLY PRESS, Inc. 

Baltimore, Md., U. S. A. 


Eottred SB aooond-olus mutter at the Poet OfBce 


ut Baltimore, Maryland, under the aot of Mojoh 8. 1870 ' 





CONTENTS OF VOLUME XIX 
AuTICIjliti 

BANER-nJE, K. S. Weighing Designs and Balanced Ineompleto Blocks.304 

Bliss, 0.1., and Cochkan, W. G. DiscriininanI, Functions with Covariance 151 
Buennan, J. F., and IIousneh, G. W. The Estimation of Linear Trends 380 
Camp, Buiiton H. Generalization to N Dimensions of Inequalities of the 


Tchebycheff Type. 308 

Cochran, W. G., and Bliss, C. I. Discriminant Functions with Covariance 151 
Dwyer, P. S., and MacPhail, M. S. Symbolic Matrix Derivatives ... . 517 
Epstein, Benjamin. Some Applications of the Mellin Transform in 

Statistics.370 

Feller, W. On the Kolmogorov-Smirnov Limit Theorems for Empirical 

Distributions. 177 

Gurland, John. Inversion Formulae for the Distribution of Ratios. ... 228 

Harris, T. E. Branching Processes.474 

Hartman, Phiup, and Wintneu, Auukl. On the Effect of Decimal Cor¬ 
relation on Errors of Observation. 389 

IlERnACH, Leon II. Bounds for Some Functions Used in iSequentially 

Testing the Mean of a Poisson Distribution.400 

Hobpfding, WASSUiY. A Class of Statistics with Asymptotically Nonnal 

Distributions. 293 

HoEPPDiNG, Wassily. A Non-Pararactric Test of Independence. 540 

Hobl, Paul G. On the Uniqueness of Similar Regions. GO 

Housner, G. W., and Brennan, J. F. The Estimation of Linear Trends. 380 
Kac, M, On the Characteristic Functions of the Distributions of Estimates 

of Various Deviations in Samples from a Normal Population. 257 

Kempthorne, 0. The Factorial Approach to the Weighing Problem. .. 238 

Kendall, David G. On the Generalized “Birth-and-Doath” Process. 1 

Kincaid, W. M, Solution of Equations by Interpolation.207 

Lehmann, E. L., and Stein, C. Most Powerful Tests of Composite Hy¬ 
potheses I, Normal Distributions. 495 

Lotka, Alfred J. Application of Recurrent Series in Renewal Theory— 190 
Madow, William G. On a Source of Downward Bias in the Analysis of 

Variance and Covariance. 351 

Madow, William G. On the Limiting Distributions of I'latimatoa Based on 

Samples from Finite Universe.. . 535 

MacPhail, M. S., and Dwy'ER, P. S. Syraboho Matrix Dorivativos...... 517 

Mosteller, Frederick, A /c-Samplo Slippage Tost for an Extreme Popu- 


Nanda, D, N. Distribution of a Root of a Determinantal Equation, ... 47 

Nanda, D, N. Limiting Distribution of a Root of a Determinantal Equation 340 

Plackbtt, R. L. Boundaries of Minimum Size in Binomial Sampling_ 575 

iii 















IV 


CONTENTS OF VOLUME XIX 


Richaeds, Paul I, Probability of Coincidence for Two Periodically llceur- 

ring Events. . ... Ki 

Robbins, Herbert. Mixture of Distributions. lltMl 

SiLBER, Jack, Multiple Sampling for Variables. 2 Hi 

Stein, C , and Lehmann, E. L. Most Powerful Tests of CompoHilo Hy¬ 
potheses I. Normal Distributions .. .41)5 

Tukey, John W. Non-Parametric Estimation III. Statistically Equivalent 
Blocks and Multivariate Tolerance Regions—The Discontinuous Case. 3t) 

VoTAW, D. P., Jr Testing Compound Symmetry in a Normal Multi¬ 
variate Distribution. .1.17 

Wald, Abraham. Asymptotic Properties of the Maximum Likelihood Esti¬ 
mate of an Unknown Parameter of a Discrete Stochastic Process .... 40 

Wald, Abraham Estimation of a Parameter when the Number of Un¬ 
known Parameters Increases Indefinitely with the Number of Ohserva- 

tioM.220 

Wald, Abraham, and Wolpowitz, J. Optimum Character of the Sequen¬ 
tial Probability Ratio Test. 32(1 

WiNTNER, Auhel, AND Hartman, Philip. Ou the ElTect of Decimal Oor- 

relation on Errors of Observation . 3 SC) 

Wold, Herman 0. A On Prediction in Stationary Time Series. ,5,58 

Wolpowitz, J. and Wald, Abraham. Optimum Character of the Sequen¬ 
tial Probability Ratio Test. 32(i 


Notes 

Albert, G. E Correction to “A Note on the Fundamental Identity of 
Sequential Analysis”. ,j2(l 

Aroian, Leo A, The Fourth Degree Exponential Distribution Function.. 580 

Bacon, H M. A Matrix Aiusing in Correlation Theory. 422 

Birnbaum, Z. W. On Random Variables with Comparable Poakedness. , 70 

Chung, Kai Lai. On a Lemma by Kolmogoroff. gg 

Cramer, G. F. An Approximation to the Binomial Summation !,.!!,.! 592 
Dixon, W. J. Table of Noimal Probabilities for Intoiwals of Various 
Lengths and Locations. ^^2 ^ 

Guttman, Louis. A Distnbution-Pree Confidence Interval for the Moan 410 
Guttman, Louis. An Inequahty for Kurtosis . . 277 

Horton H. Burke A Method for Obtaining Random Numbers!!! i .'81 
T lyi Asymptotic Expansion of the nth Difforonco of Zero 273 

OoireoKoa to "On the Ch»lier Type B Serins” . 427 

Sons Poisson'Ketri- 

. .. 



















CONTENTS OF VOLUMli XIX 


V 


Marks, Eli S , A tjowcr Bound for Llio Expected Tnivel Amouf!; m Kandutn 

Points . II!) 

Murphy, R B. Non-Panmietiic Tolerance Limits. aSl 

Noether, Gottfried E. On Conlidonce Limits for (iiiantilCH , . . ‘11(1 

Rasch, G. a Functional Equation for Wisliart’s Distribution.262 

Robbins, Herbert. Convergence of Distributions. 72 

Robbins, Herbert. The Distribution of a Definite (Fiadratio Form.,, 2(10 
Robbins, Herbert. The Distribution of Btiuleiit’s I ivhen the ropiilatiou 

Means are Unequal.40(1 

Smirnov, N. Table for Estimating the Goodness of Fit of Empirical Distri¬ 
butions . 279 

Tukey, JohnW. Approximate Weights .91 

Walsh, John E On the Use of the Non-Central /'Distribution for Com¬ 
paring Percentage Points of Normal Populations. 93 


Miscellaneous 


Abstracts of Papers. 

Adoption of tlie New Constitution. 

Book Reviews. 

Constitution and By-Laws of the Institute. 

Nows and Notices. 

Report of the Editor. 

Report of the Institute Committee on Teaching of Statistics. 

Report of the President of the Institute. 

Report of the Secretaiy-Trcasurer of the InKtituto. 

Report on the Berkeley Meeting of the Institute. 

Report on the Chicago Meeting of tlie Institute. 

Report on the Madison Meeting of the Institute . , , , 
Report on the New York Meetings of the Institute . 


, . 110,428,595 

.Ii07 

, . 122,282,430 

.145 

127,285,438, {K)3 
. 144 


95 
137 
,,, . Ml 
,.132,444 








the annals . 

of 

MATHEMATICAL 
STATISTICS 

(nmiiDMi VT n. o. cAitnm} 

Thb OmaAi. JouBWAi. ow vBs IireiEmwsB 

OF AA^AVSTEIMATICAIj I^AX18!ra<^ 


f 




XMB 


47 

SB 

66 


Contimis 

“Biru.M«<i.n„a,« Wa n.™ a. 

, Case. Joarw. IvSr Reg.on«~TI,a DiwoattoJSS 
ABiunAM Wa*,»., « « » i^tsonete Stoehastie IVoeesit. 

SiS? Similar'^'^V • pV^ a ^;;; ■;;,; ■ 

Oonvwawww of SwtribatloiMi. ^ 

"’^ 3; .BtaavAow. 78 - 

I: • 

Book jtoirienwi-', ‘'■■“‘*;“'*'*'.-“’.--.'-v.;.'..;i.i',;,: 116 ' ■•; 

gepog Q»jk % ^:.,.,,137 


'\''! !;V;'"■' 


<? >■' / rt’' *Vi* k •/, ii'' ,1 Jf«' Jl i>>«ju', 1/ ' ' ' ri /■ '' 4 ' tf' •■ 'll' 



















ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 
Bt David G. Kendall 
Magdalen Oollege, Oxford 

1. Introduction and Summary. The importance of stochastic processes in 
relation to problems of population growth was pointed out by W. Feller [1] 
in 1939. He considered among other examples the "birth-and-death” process 
in which the expected birtli and death rate,-) (per lieacl of population per unit of 
time) were constants, and mo , say. In this paper 1 shall give the complete 
solution of the equations governing the generalised birth-and-death process 
in which the birth and death rates X(i) and mW may bo any specified functions 
of the time L The mathematical method employed starts from M. S. Bartlett's 
idea of replacing the difforential-diEference equations for the distribution of the 
population .size by a partial differential equation for its generating function. For 
an account of this technicpie,' reference may be miulc to Bartlett’s North Caro¬ 
lina lectures [2]. 

The formulae obtained lead to an expression for the probability of the ultimate 
extinction of the population, and to the necessary and sullicicnt condition fora 
hirth-and-<leath process to be of “transient” type. For transient processes 
the distribution of the cumulative population is also considered, but here in 
general it is not found poasilile lo do more than evaluate its mean and variance 
as functions of I, although a complete solution (including the determination of 
the asymptotic form of (he dislribution as i lends to infinity) is obtained for the 
simple process in which the. birth and death rates are independent of the time. 

It is shown that a birlli-and-dcath jiroeeas can bo eonslmcted to give an 
expected population size rh wliich is any desired function of the time (, and among 
the many possible solutions the unique one is determined which mokes the 
fluctuation, Var(a()> minimum for all < 

The general theory is illustrated with reference to two examples. The first 
of these is the (Xo, miO process introduced by N. Arlcy [3] in his study of the 
cascade showers associated with cosmic radiation; here the birth rate is constant 
and tho death rate is a constant multiple of the "ago”, /, of the process. The 
ili-curve is then Gaussian in form, and the process is always of transient type, 

The second example is provided by the family of "periodic.” processes, in 
which the birth and death ratea are periodic functions of tho time 1. Those 
appear weU adapted to describe the response of population grouch (or epidemic 
spread) to the influence of tho sonsona, 

2. The formulation and solution of the equations fox the general (X, m) process. 
L'et the integei'-valucd time-dependent random variable rij measure at time f the 

1 It appears from some remarks by Alley and Borohsenius [fi] that the generating func¬ 
tion method was first employed in problems of this kind by Dr. C. Palm. 

1 



2 


DAVID 6. KENDADIi 


size of a population, and suppose that at eliBliSttt of time dl the only possible 
transitions (and their associated probabiliti^ ilfe: 

n(+d( = Wi + t, \(()nid( -h o(d/); 

(1) = n,, 1 - {X(f) + + o(di)i 

Hi+di = W( “ 1, fi(t)nidi -h o(dl). 

As an initial condition it will be supposed that the population is d^eendod from 
a single “ancestor", so that no = 1, and thus 

(2) Pi(0) = 1, P«(0) * 0 (n ?iS 1). 

It then follows that the P„(t) must satisfy the differential-difference equations 

(3) I P.(t) =■ (n + 1 )mP^i( 0 + (n - l)XP„-i(0 - n(X -f p)P,(f), ft > 1, 
and 

( 4 ) l^Po(t) => ^Px(l) 

(where for convenience of writing I have ceased to indicate expUeitly the de¬ 
pendence of X and m on the time). If P„(t) is defined to be zero when n < 0, 
the first of the above equations will then be true for all n, and accordingly the 
generating function 

(5) ,»(z, f) s E PMy 

must satisfy the linear partial difierential equation 
W I . fe - ,Kx, _ 

the problem is to find the solution to this equation when it is coupled with the 
boundary condition i/>(z, 0) = z. 

The equation (6) is of Lagrange’s type, and can be solved in the usual manner, 
the auxiliary equation is 


(7) 


I - - M + (X -h - Xg*, 


and while in particular exarnples it might be convenient to attack thb equatic 
y, progress m general is more easily made by observing that (7) is i 

mZrtv ST’e r t K tmLLnb 

P^perty of a Ricc ati equation is that the general solution is a homograph 
’ See, for example, G. N. Watson [4J, pp. 9S-94. 



ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


3 


function of the constant of integration, so that 

^ _ /i + Gfi 
/3 + Cf,’ 


and equally 


r ~ 
h - 


where /a and U are all functions of the time t. Thus the general solution 
of (6) is of the form 

and frona the boundary condition <piz, 0) = 2 it then follows that 


<p(z, t) 


gi(t) + gga(0 
ffa(t) + Zg4(0 ■ 


On expansion, one obtains 


( 8 ) 


Po(i) = and PJS) = (1 - -Po(0}(l - n«) 


n—1 


(n ^ 1), 

where and tj* are functions of the time t. Thus, for the general (\, p) process, 
the population size at any time is distributed in a geometric series with a modified 
zero term 

The next stage of the solution is to determine the functions f« and tji. From 

8 ), 


( 
( 9 ) 




and if this expression for be substituted in (6) it will be found' that 
(v^' - ^v') + »?' = X(1 - 0(1-v), 
and 

= ,4(1 - f)(l - v)- 

Now let Z7 = 1 ~ f and 7 = 1 — so that 

U'/U = - fiV, 
and 

7' = (a - X)7 - ,iV\ 

The last equation is of Bernoulli’s type and can be solved by writing 

W = 1/7, 

•Here {' ™ d£/dt, ete. 



4 


DAVID Q. KESDALD 


SO that 

W + (m - X)H' = M. 

Initially J = „ = 0, and !7 = f = ir = 1 ; the sohilion of the ir-wiiiation . 
therefore 

(10a) If - + jf 

where the function p is defined by 

(11) pit) = I 1/‘(t) - X(TlidT. 

Integration by parts gives two other formulae for IF which will priwe iiwful 
they are 

(10b) IF = 1 + c"' / c'‘’'X(T)dT, 

*fo 

and 

(lOo) IF = + e"') + H* M Irfr. 

The quantities V and V, and hence also {and r) can now lie cxprtwMHl in terms of 
p and IF, for 

U' _ p _ W , 

V If ~ IF ' 

and so 

( 12 ) ■ ?, = 1 - ^ and n. = 1 - 

These results, together mth (8), suffice to determine completely the PM a® 
functions of the time t. 

It is easy to deduce formulae for the mean and variance of ft, (these could tdso 
be obtained directly from (6)). For the mean, 

(13) it, = LzJ‘ = , 

while for the Variance, 

Var (ft,) = l = e-\2W - 1 - e"') 

il4c) '' 

= e ^ I e''*'’|X(T) + n(T)}(iT. 

Jo 



ON THE GENERALIZED “bIRTH-AND-DEATH” PROCESS 


Alternatively, using the other forms for W, one can write 


(14a) 

Var (flt) = e “ < 

Jr' - 1 + 26-' jf' c''^V(r)dr j 

(14b) 


1 - e'' + 20“' jf* e'’‘"^X(T)dT 1 


If the initial population Wo = iV > 1, these formulae for ni and Var(wi) are to 
be multiplied by N. 

It is now a simple matter to apply these formulae to the Arley (Xo, miO proc¬ 
ess. It will be found that 

p = — Xoi. 

and 

17 = 1 -f f* dr. 

Jo 

The moan growth of the proce,ss therefore follows the Gaussian law 

while for the variance (using (146), since X is a constant) one finds 

Var (h,) =n,(l - «,) -f 2Xon? f* dr, 

Jo 

in agreement with Arley [3] and Bartlett [2]. The distribution of n» at time . 
follows on inserting the above values of p and W into (8) and (12). 

3. The chances of extinction. The simplest special case is that in whicli 
(X, n) have the constant values (Xo, po); this is the process introduced by Feller 
[1] and later di,scu8sed by several writers.'* The formulae (13) and (14c) give 
at once the results 

(15) Hi = and Var (wj) = ^ Hiifit — 1), 

Xo — 1^0 

due to F^er, while since 

W — ^0 

Xo — Mo 

equations (8) and (12) give 

(16) Foil) - ^ and PM) “ (1 - -PqCOKI “ ’lOnr' (n ^ 1), 

_ Mnt “ Ho 

< See Arley [3], Arley and BorohaeniuB [6], Bartlett [2] and Kendall [6]. Palm’s formulae 
(16) are stated without proof by Arley and Borohsenius, but it appears from their remarks 
that be used a generating function method probably identical with that later employed by 
Bartlett and myself. 



6 


DAVID G. KENDALD 


where 


r,t = -“Po(<) 

Mo 


Xo(rI( — 1) 
'kaHt — m 


These formulae were first given by C. PaJm.' They actually hold only if 
Xo 7^ MO; in the case of equality, 1^'' = 1 + Xof) and then 

fii « 1, Var(nO « 2X«f, 

(17) Po(0 = and P.(0 = (1 - Po(01(l ~ (n ^ 1). 

where tii = Pofi). 

One particularly interesting point is that 


Po(i) ~>lasf —^wifXo^Moj 


so that the population is “almost certain” to die out, even though in (he critical 
case (Xo = Mo) the expected population size ni has a constant value. The same is 
true for any initial size of population; the new expression for I\{t) is then simply 
equal to the former one raised to the power no = N, and therefore tends to unity 
as before. This phenomenon of extinction was first noticed in a similar problem' 
by Francis Gallon and H. W. Watson; an account of their work is given in Ap- 
pendbe F of Galton’s book [7]. 

The formulae of the last section now make possible a discussion of the chances 
of extinction for the general (X, m) process. When Uo = 1, 


(18) 


Po(0 = 



and so the necessary and sufficient condition for the ultimate extinction of the povu- 
latnon is that the integral 


(19) 


I = f e^«M(T)dr 

•'O 


should be divergent. 

S i ^ infinity, or have a finite value. lienee in any 

ca^ the population always has a definite chance of extinction, given by 1/(1 + /) 


( 20 ) + - v)e Y 

—-- __ \ 1 — ija j ’ 

•The extanction of family-namea Further references will be found in my paper tfl]. 



ON THE GENEHALIZED “BIRTH-AND-DEATa” PROCESS 


7 


SO that 


Pod) = Sf, 

and the chance of ultimate extinction is 


( 21 ) 



which is or is not equal to unity for all N indifferently. 

Extinction is impossible, in the sense of being an event of zero probability, if 
and only if n is identically zero, so that the process is one of reproduction only. 
It IS also worth noting that a necessary but not sufficient condition for almost 
certain extinction is the divergence of the integral 

( 22 ) f fi(r)dT. 


For if (22) had a finite value, p(i) would be bounded for all I, and so (19) could 
not be divergent. In general, when I = oo and the population is almost cer¬ 
tainly doomed to extinction, I shall speak of the proce.s.s as (ransicnl. 

For a transient process it i.s of interest to consider the random variable T, 
defined to be the "age” of the process at the moment of extinction. Hince. 


Pod) s Probability [T < <), 
the probability distribution of T is Pa{J')d2', or 


(23) 




1 + 




0 < T < «. 


For example, in the simplest birth-and-death process, when \ and n are equal 
constants, the distribution of T is 


(24) 


Xo dT 

(1 d-XoT)*’ 


0 < T < CO. 


This is for an initial population no = 1; more generally, when no 
distribution of T is 


NPoiT)[Po{T)f"^dT, 

The median UJe4me I'„, is determined by the relation 
(25) r"" e'^"V(r)dr = 1. 

Jo 


Af > 1, the 


For the simple process, Tm = 1/Xo when Xo = no, and more generally 


(26) 



(Xo 7^ Po) 



8 


DAVID G. KENDADL 


if Wo = 1. When wo = W > 1, the formula for 3’« becomps 
(27) 


For the balanced process (Xo i Xo) it therefore follows that 
(28) r„(W) = r«(i)/(2’''' - 1) ~ 1.44 JV 


as N tends to infinity. If the process is unbalanced, however, so that X« < 
this asymptotic proportionality to N does not hold, and instead 


(29) 


r,„ 


1 

Mo ~ ^o 



2'"^ Mo - \o \ 
(2^l» - l)w,/ 


Mo " ^0* 


as N tends to infinity. 


4. The cumulative population. There is associated with a birth-and-death 
process another random variable, Mt, which i.s of importance in some, applica¬ 
tions. This is defined as follows; initially il/n = wo, while for t > 0, Mi shares 
all the 'positive jumps of ni_. 

For example, if Wj represents the number of cases of a disease in a iiopiilation 
at time t, Mi will be the total number of cases which have been recorded up to 
that time. If the process is transient, so that the epidemic is almost cerlainly 
extinguished in the course of time, ilf«, ivill then be a measure of its overall 
severity, 

Again, if wi represents the viahh count of a population of haeleria* with a birth 
rate X(i) and a death rate M(t), Mt will be equal to the total count in which living 
and dead organisms are not distinguished. 

In order to discuss the joint variation of W/ and ilf t it is necessary to introduce 
the new generating function 

(30) ic, /) = f; f; . 

n—0 M-O 

Here the give the joint frequency-distribution of Ut and Mi at time L 
By the usual argument the differential equation satisfied by the function d- 
will be found to be 


(31) 


~ - (Xwz* - (X -f m)z -f m) ^, 


and the associated boundary condition (if initially no = Mo « 1) is 
(32) • ,p(z, w, 0) = zw. 

whe^X^S fthis equation for general X(t) and m( 0; the solution 

when X and a are constants will be given in the next section. It is however 



ON THE GENERALIZED “bIRTH-AND-DEATH” PROCESS 


possible to iind general expressions for the mean and variance of M i; for this 
purpose it is more (‘onvenient^ to work with the cuinulant-gencraling function 

(33) /v(w, V, t) = log ^(e“, c*, t). 

This satisfies tlie differential equation 

(34) 


f - - 1) - ,(1 - Ol 


and of course 

K = ufii + vMi + Var (ni) 

(35) 

+ Var (ilt) + uv Cov (iic, Mi) + - - ■ . 

Expanding both sides of the equation in powers of u and v, and equaling coeffi¬ 
cients, one obtains the differential equations 


(36) 

(37) 

(38) 

(39) 
and 

(40) 


di 


Ki = (X - fi)ni, 


- i'ar («() = (X -f + 2(X - fi) Var (rq), 
dl ’ 


dl 


Var (Ml) = Xrlt 2\ Cov («<, Mi), 


- Cov (ut, Mi) = Xrtt + X Var (ni) + (X — ji) Cov (ui , Mi). 


The solutions to the first two equations have of course already been given in 
section 2; from the third it follows that the mean value of Mi is 


(41) 


/' 


Mi = 1 + / c“'’^"’X(r)dr. 


The solution of the fifth equation is 


(, Ml) = ^ ^1 -t- 


Var (rt f) 
ilr 


X(r)dr, 


(42) Cov (rit, 
and so the variance of Mt is 

(43) Var (Mi) = f‘ {fir -f 2 Cov (n„ Mr)]X(r)dr 

Jo 


’’ Compare Bartlett [21. 



10 


DAVID Q, KENDADD 


In illustration of these formulae, consider first the Arley (\o, niO proeesi; from 
(41) 

(44) M, = 1 + X. / dr, 

Jo 

but the complete expression for Var (Mi) rvill be a millliplc inlCKral whii'h does 
not appear to admit of much simplification. 

For the simple (Xo, Mo) process, however, when Xo < Mo, it readily follows that 


(46) j¥, = 

Mo ~ Xo 

(46) Cov (m , M,) = I 2 /X 0 i - (1 „ A , 

Mo Xo ( Mo ~~ Xo j 

and 


(47) Var (Mi) 


Xq(|U 0 + Xo) 


(Mo 

Thus m the limit, as t — 


Xo)^ 


(1 - jii) 


4Xo fioiHi , Xo(mo -j- Xq) j.. 
(mo-Xo)^'^ (Mr-'Xo)’ ^ 


rll). 


00 , the meau^nd variance of are 




r V 


(48) 


Mo — Xo^ 


and Var 

(Mo ■" Xo)^ 


the covariance of course tending to zero. If the process is balaueed, so that 
Xo = Mo and Mi « 1, the integral for Jlf, has the value 1 + Xot, which inereasea 
wthout limit as t tends to infinity. This will always be so for a balanced iirocesa 
if the integral 


[ X(r)dT 
Jo 


is divergent. 

If the initial population is equal to iV > 1, and if all its members are counted 

necessary to the above formulae is that in each 
case the right-hand side is to be multiplied by N, 

ai«tribu«<a. rf th, cumuhtl™ popiOatlen tor . oimjl. 

Lt, ,1 f ^“I’ractable even if one only requires the asymptotic distribu- 

t on deiermmed by i^(l, w, co), can be solved completely in the speoiaUv siraolo 
c^e ^en the birth and death rates X(.) and M(o\ave^hetnTita^^ 

Let a and ^ be the roots of the quadratic 

>^ 0 ®' - (Xo -h w)2 + Mo = 0, 



ON THE GENEUALIZED “bIRTII-AND-DEATH” PROCESS 


11 


SO chosen lhat 0 < o; < 1 < /I; then the general solution of (31) will be found by 
the usual method to be 

/■ 


xl> = 'I' 


The boundary condition ^{z, w, 0) = zw thcrefoi'n gives 


(50) 


\(/ =w 


a(0 -z) + fi (z - 

— 2 ) -{■ (3 — / ’ 


and it may be noted that if no = j¥o — N > 1, this formula for ^ would have to 
be raised to the JVth power. It will sufRce, however, to di.scus,s the slmple.st 
case when »o = Mo = 1. 

Let tlie process be transient, so that Xo < Mo; then tlie asymptotic freriuency 
distribution of AI t when i ^ is determined Ijy the generating function 


(51) 


^( 1 , lu, co) = wa = 


Xq -b Mq ~ V'^KXo H~ Mti)~ ~ dXiijupi pj 

2 X 0 


and here it is the posi(.ive square root which must he taken. The probability 
distribution of i¥„ is thus 


(52) 

where 


^ _Xo-bMfl (2¥)! 

2 X 0 22"(M 1)= 211/ -~1 ’ 


(i¥« 1,2,3, •«■), 


(53) 

The first few terms are 


LXq juq 
(Xo + Mo)® 


(54) 

Ao + Mo 

and it is easy to verify that the mean and variance of this distribution agree 
with the values given m the last section. When Xo = mo , x == 1, and then the 
terms in (54) fall off to zero like Jlf.^ being infinite (in accordance with the 
remarks at the end of section 4). 


6, The determination of the process when its mean growth, rl», is gdven. 

Since ?1< = it follows that 

(55) X(0 - m( 0 = logrTi, 

and thus if fit is required to be a given function of the time, the birth and death 
rates must be chosen in accordance with (55); the only other condition is that 
for all i, X(t) > 0 and ii{t) > 0. 

Arley has pointed out that the simple process (X(0 = c, n{t) = 0) gives a 
smaller fluctuation, Var (n*), than any other simple process with the same mean 



12 


DA-VID a. KKNDALti 


growth, say (Xo, mo) where Xo - Mo = c- This suggests that si.nuhl 

the more general question: if fit'^s given for all t, for which rhow* aj ^ ^ 

X(t) and g(t) will the fluctuation Var (jh) he a nwumunt* 

Suppose then that the whole region / > 0 eoiisists of three sets of intor^a*. 
Et, Ei and Es , and that wothin an interval of the set h, . 

fit IS a decreasing function if j == 1 , 
is an increasing function if j — 2 , 
and wi is a constant if 7 = 3. 

Then one can write 

Var (n,) = [ e^''\(r)ih 

J*! 

+ + 2c-"'’ f f''Vfr|r/r 

+ «-“'■[ c'W{X(t) +M(r)irfr. 

Jg, 

Here the terms involving X and g explicitly are all non-negative, and s«» Var 
(nj) will be a minimum for the (unique) choice of X ami ju whieli makes Uieni all 
vanish, namely: 

in El I X(t) = 0 and m(0 =• — fit' /’fit ; 

(56) in Ei , X(t) = fii /fit and ju(f) => 0; 
m Eti, X«) = g(i) = 0. 

However, when one is looking for a (X, g) process with a given j!i funeti«m, 
this minimum-fluctuation solution would frequently be an artificial one. For 
example, suppose it is required that ri( shall be a Clausaian curve, reducing 
unity when t = 0 ; then 

(57) fit = e'"'-'"*", 

say, and X(t) - ^(t) = Xo - Mit; the most natural solution is then the Ariev tmr- 
ess, ' * 


X(0 — Xo, /i(<) = Hit. 


It is of interest that a (X, g) process can 
follows a logistic law, 


be found for which the expected growth 


(58) =, 

According to (55) one must have 


g 

1 + (g — 1)6~M» 


MO ~ Hit) = 


(a - l)/3 
^ + (g - !)■' 


{a> 1,0 > OK 



ON THR aENERAUZED "BIRTH-AND-DEATH” 


J^OCEBS 


13 


The miniirrmra-fluctnation 
(S9I X(f) 




c». -f^ce Z- 1 ) ’ 




■ 0 , 


wliich wtlRfiPS the relation 




art might have Iwn otpeeted, since the Verhuist-Peai-l-Pp/vj ra 
twui «wl)ieU forms the tkterminislic basis for the logistic laiv) is 


m) 



7. '‘Periodiic'* birth-aad-death processes. As a further evti i 

thwti^- it, irt worth coiirtidcTing the “periodic” prtH'cssos for iSth 
grtmih ri, is ii fanclitm of the time which rejx'ats itself witli ft • 
will then follow that p{l) and .rto also X(t) - p{l) have the period"whT' m 
jmiKt i»e wm whenwer t i« an intcBPr multiple of u. Tim e- ^ ^ 

{m> tluw m which X and n arc separately porindie, and tlien it c b f 


(r. 2 ! 


H 


Hn and Var (nl « /ainjf t'‘”lX(T) + 


■)ldT 


whenever f ... for every positive integer ir Thus, although the ezpccled 
value of n, regularly, in practice this "periodicity" would be ol 

soured by the rapid itvere^, wito incre^ng 1. in the magnitude of the random 
flaetuaU(m« (an measured by Var (a*)). Moreover, since 

jf*V''V(T)dr » fcj[V'VT)dr, 


it k clear that the process is neccaearily transient, there beinc unif nPAV,owr+ 
that n, wll ultimately be reduced to zero. ' pronaniiity 

Periodic! birth.^nd4eath processes are likely to be of importance in biolojty 
it almuld l>© pointed out, however, tlmt this type of process describes the stoohM’ 
tie modificalSon of a reffuka- periodicity imposed on the model from outside and 
it» not to be confused with other stochastic models which themselves generate 
Itrcgular (non«phi«i*keieping) osdllationa. The models discussed in this section 
are In fact Rultable for the quantitative description of seasonal influences. 

Before going into further detail H is natwal to specialise the model by assum¬ 
ing that the functiiona X and n are at most simply harmonic. If n, = i and since 
there is to be no dampingi one will then have ’ 

(63) «!« 


(“>0), 



14 


DAVID G. KENDALL 


where vw = 2 ir, and « and e are amplitude and phase constants, reapectivoly. 
The functions X and n are now to be determined from the relation 

X — = av cos (/(t + «), 

and this can be done in many ways. The minimum-fluctuation solution would 
here be artificial, and it is more natural to select two other solutions, 

(64) X = a»-{l + cos v(f + «)}, IX - av, 
and 

(65) X = as, n = avll — cos v(( -f- e)!, 

for further consideration. In the first of these the death rate is constant and 
the birth rate executes simple-harmonic oscillations, while in the second it is the 
birth rate which is constant, and the death rate which oscillates. Il can 1 h‘ **i*a 
that, of all solutions of these two types, (64) and (65) are those with the leant 
value for Var (n,). From formulae (14a) and (14l>) it will be found that, for 
either process, 

( 66 ) Var (n) = 4iri:a/o(«)e“'*°’" when t — kS) 


where Io(«) is the Bessel function of zero order, of the first kind and of imaginarj* 
argument. (It will be noticed that, whenever t is an integer multiple of «, the 
distribution of the population size n, is the same for the two models.) For amall 
oscillations, when t = kC), 

(67) Var (n) ~ 4irfca as a —+ 0 
since Io( 0 ) = 1 , while for large oscillations 

( 68 ) Var (n) ~ 2fc(2ira)V^in as a —i «. 

(Here is the minimum value of w,.) 

The calculation of A(«) presents some points of interest. For either model 
it proves to be 


(69) 


2T«7»(a)e"‘‘°-’ 

1 + 2iraZo(a)e“"*""' 


this is the probability that a population element, known to be descended from a 
single individual at time f = 0 , will have become extinct one year later (if one 
identifies the oscilladons with a seasonal effect). It will be seen that P((w) 
wiU be least when sin vt = - 1 , and greatest when sin re =» -flj i.e. when 
n, IS expected to have a minimum, or a maximum, at f = 0, respectively. Ac¬ 
cordingly It follows that the progeny of a new member of the population is most 
hkely to survive tiU the followmg year if the “ancestor” commences its “mem- 

mum Lfe " """ PoP^'lation would normally have its mini- 



OK THE OENERAU*ED "bIRTH-AND-DEATH" PUOCBBS 


15 


(n rxiirliwinti. I wWi tt) llumk Professor M. S. Bartlett for many helpful dis- 

ruhhiori**. mi the ^object of this paper. 

REFERENCES 

jl] W. FrwM, '‘Dm Draudls^eB der Voltorraschen Theorio dea I^amples uina Easem ia 
wahrwbwnlichbeHslhwretiBcber Behandlung'’, Aela Dtolheorelka, Vol. 6 (1030), 
pp. ii-40. 

P; M. S. AftKfmlk ProcMm (notM of a courao given at the University of North 

in the Fall Quarter, 1048). It is understood that copies of those notes 
m nvaitnhte on request. 

[3f X. Am*V, Oft fAe ThMry aj Sioehatlte Procewa and Iheir Applxcalion lo the Theory o/ 
dmmk Badi^ion, O, K. t:. Gads Forlag, Copenhagen, 1943, pp. 108-114. 

H! G. X*. VYAWiiT, Th Thtmy of Bemd Functions, University Prestf, Cambridge, England, 
IM4. 

{3| X. Antisr aso V. BnacrwENica, "On the theory of infinite systems of differential equa¬ 
tions and Ihftir appUcatten to the theory of stoohastio processes and the porturba- 
Uo« theory of quantum meohanics”, Aela Mathematica, Vol. 70 (104S), pp. 261> 
m (wp. '«M). 

(8S D. G. Kks«»au,, “On eome mode# of population growth leading to R. A. Fisher’s loga- 
rithmlo mAtss dUlrihulion’’. To appear in Biomlrika, 

{7| FaAttCJa Gai.to«, Naivral Mmtanoe, Maomillan, London, 1880. 



PROBABILITY OF COINCIDENCE FOR TWO PERIODICALLY 
RECURRING EVENTS^ 

' By Paul I. Ricaunus 

Brookhaven National Laboraiory 

Summary. This paper contains a study of the following problem: i»f 

two events recurs with definitely known period and duration, while the .“isrtinR 
time of each event is unknown. It is desired that, before the elapw: of a W'rtsutii 
time, the events occur simultaneously and that this “overlap” bn of at IcwJ a 
given minimum duration. 

The probability of this satisfactory coincidence is first evaluatedj and ii i*? 
found that the solution, while mathematically adequate, is of no V’alue for prar^ 
tical application. This circumstance arises from the possibility thut, witfi 
certain rational ratios of the periods, the events may "lock in st^p”, Acourd- 
mgly, an attempt is made to smooth the probability function with r«‘pr*«-t tu 
small variations in the ratio of the periods. Due to difficulties in manipulating 
the number-theoretic expressions involved, this smoothing is carried through 
only by the use of certam approximations. Moreover, because of theae hiujir 
difficulties, an averaged value of the probability itself is not obtdmsl, but. iir 
its stead, there is derived a formula for that fraction of randomly related rctanttnl 
trials in which the original probability Mil be less than one-half. 

Thus, the original problem is not completely solved. The rmiUa obuduMl, 
however, do allow one to compare the relative advantages of different siluatioti'S 
and to make a rough estimate of the likelihood of success. Generally si»ciikiug, 
the analysis is applicable whenever the ratio of "on time” to "off time" i« smali 
for each event. 


1 . Introdiiction. Our problem may be represented schematically as follows* 
Consider two pulse waves (Fig 1) of periods T., T,, pulse widtliS luh, and 
phases i/n, It is desired that these pulses overlap at least once wdfchm a given 
time interval; moreover, an overlap is not satisfactory unless its duration is at 
east as great as some assigned t„. The starting phases 4n and are unknown 
ioiMh waves. Our problem then, would appear to be td calculate as a function 
of time the probability of at least o,ne overlap of duration at least U. 
i. I* “Iculated lato, md, whils mathsmaUMUy adeqml,.. 

”,' , This ™thor mmual m 

ppbed rnathematics ^ses from sources generally kept in mind only by expwL 

_vovmg as It always does at some stage, the use of the human senses, predudea 

Harvard Univer- 

fot the accuracy of the statements contained herlin. ^ ^ “ Wsponsibility 

16 



ipjt''HAJs».i7 4 coiNarjEHOE 


17 


tl,r {dv.ii^ji-i’jsv valun,of tlw paranictprs of the problem. 

1« rrlhcr 'A'-r4't. Hhli-oiftli civif^-rimptUal irntr can wimetimes be made amazingly 

ji laii Mi'f W*rbrdjn.itp'! 

\..A b' f •,ir'}-"! fr-tiii rx^ltilily that the waves may “lock in 

»«!i pjddb.Kbjbn •■iiJfrfm'h rTratie with iwitpwt to very minute changes 
i}i tl'.p 1\ , 11. r>n (A,i mpK H r, •- Ti - UW/, « looti «„ = 0); a 

darj-j' I iii« xlniWr. that, for all timtit greater than 7'i = Ts, the 

pr.'Jii*! d>‘v j,‘ (t d,t if wv It»t 7*| • 7*3 |“ <, one wave will “creep up” 

mi shf -u»4 « M'Jitiiidh 'Sur tjinerfi RmUer than TiTs/() the probability is 
uwH ; I bi:* !t in:»^ Yrry well Snipjrn jn a pnwlieal application that the parani- 
t'lrfs fijr kic'HSK t n ;ifi .khjt.mj «*aM*ntta!!y Miffieient only to give the obvious 
r«'‘u!<: tt “ 1 



j 



Ifl ifIV luaulsnd s»ivb|ej3j *,r>jitinitUy etnwidwed, uncartainty in the data arose 
n*tf ermf hut 8l« from slight insUtbiUty of equipment. 

*fh'a*» SMcriijn *4 «vcr vwifttinWf in the [wi«k tod to to found 

rf tbr fimh H-i wm Jw' ’>4 0 ity |«w«"Sieal value wbatgiH*ver. 

F«r rwiA-nj. wdl in tto later analyai#. this «m«)thing entails 

difjnsl'sv* whsrh ih‘- w unable to ovwmtm with fuay great »uccw»; the 
(HiMfr «j fijr yrsiilt,;* Khtfh liuvv !•«*» obtained is di'«u««l in the neat^fiection. 
Tto*. wisft* involFtr rdri.vMl api»r«siwatitttw whicb^ generalfy speaking, are 


18 


PAtJI. I. KIC0ARDS 


We shall continue to use the notation already inlrodiU'ed: 
hik ~ durations of the events; 

T\,Ti = periods of the events, 

( 1 ) 

tm = minimum satisfactory duration of coincidence; and 
P ~ probability of at least one satisfaetorj'' coincidence. 
We shall also use the (at present) rather arbitrary notation: 
t = (time — i„) 

(2) Po = ih - Q(t2 ~ 


■o == (ti + <2 — 2i^lT\Ti, 

The probability function for short time intervals is; 

(3) P = Po + wt, for i < Max{2’i, Pj). 

In any case: 


(4) 


P < Po + WL 


As already explained, the functional dependence of P for larim t ix of no pr«c 
tical use due to its extremely erratic variation with small changes in tlic iH.ri.nlK 
Ti, Ti. 

For reasons which will later become apparent, the only tvpe of avcnwitiK which 
has yet been carried to completion is the following. C.'onsider tlmt nianv tnals 
of equal length are made and that in each individual trial, all the iwamctcrt, 

constant with absolute, mathematinil 
W f Assume for definiteness that T, < T,. Botiveen cMcmit trialc. 
let 4 and T, vary in such a way that W takes all values within a ran«* of 
probability. (In the original problem, the ratios h/Tj nweimrilv 
remained constant) The quantity / given below then represents that fmetum 

Tt smdkrfis, the greater are the chances of success. 

It must be admitted that this method assumes several things whicli arc not 
rue m practice. First, the parameters of the problem Sly van- 

irT./r“7“e7iir«f "r'’ 

in T Whiifl n 'j ui ^ dcmflincl eis muoK 33% vuiiaf-itin 

ttLaS ; i‘7r. 

r. J'W 

PHctioal problem "’“(■l’ ““““S'"* t”' Cic 

can be carried through ’ analysis 

n.e ««le, ,111 tt., smj ^ ^ ^ 



I'ansnaiMTi, up roiKC'iiJiiN'CK 


10 


jir..3.;d.jiisv" I? w-tJitlfl tlsuf mm that » himplp integration would yield a true 
j*r.d.nlnlisv. In!(, ihr* fnrmuliiB for / arc reasonably accurate only 

b.r il i 'Hi*’ t!«^! f<-.rinul}i fur/ - fraction of trialH in which P <}\+Qm: 

* for tw < Q, 

I 'ilit 1 f lug for lw> Q, Q< 1/2. 

Ttiia wwi<uit i/. j<n!>jw* lu i*rr«)r from several amirecs. First it is an approxi- 
UI fl Iiufflltt-r Ihrrwtic forranla given in (31); this approximation ia beat 
fur I S54«l (J IT lsirg<* t'<*mpiiir«i t(» MaxCTi, Pa). A comphU'ly general comparison 
(4 ul), a»4 iTv - m'> k gjvr-n in Fig, 2, where tlio agreement will be seen to be 
finite even for ruiafively small t and Q/v>. (The dotted contours are 

fir-ugtit hurt* }w.MnR thrfuigh ihtf origin.) When I and Q/w are small this first 
.-aurre «.f error tmt Iw idituiftaksi by ueujg Uic holid contours of Fig. 2 in place 

«*f '’.»i 

.S'riijsdb. furtmsLi '31? k an approximation and involves the use of 
rimpSifit'd pnib-ibilny f«»rrrtul{w and an assumption that Po and w arc constant 
jj,'. 7j vjrifi Tir maximum prwible magnitude, of those errors in (31) is given 
l»y indimli* fundional dependence); 

"t' fdi?, Q -- pn - q) < S(tu<, Q) < J(tZi Q + po H- q), 

wla-re, m 7*j Vinri*'^, 

w, t» minimum, maximum values of w 

Ps ilfW change in Po 

q “ maximum value of u^TiTj. 

(»«ncr<d!.v »»}K‘jnking, Uim* erttjrs are small if k/Pf are small and if t is large corn- 
pawl to Tj , Tab Also, Umn* ia considcmble possibility tliat certain errors 
will rtuied in melt a way aa to make (6) correct with g = 0. 

We idmll now outline the practical use of thcifle results. Given nominal values 
of th«! fwntmidew defined in (1), choose a convenient value for Q < J (usually 
I?, and wbitltuto intrt (2) to find tie/Q. From (6), one may then determine 
/ - fraetktn of triali in which P < P® + Q> (Low values of /are thus desirable.) 
For ftittapntalinnal coovenionre, (5) has been plotted in Fig. 3, while, above the 
wtgt* rtf F»g. 3, the followinK iiw within 1% of (5). 

(7| /« (1.008((f/hc) for to > lOQ. 

Note aim that (4) may often be of conaidorablo use in quickly eliminating cases 
of very p«»r proWiility, and recall also that (3) will give the true, directly mean¬ 
ingful probability whenever t ia no greater than Max:(Ti, Ti). 

Evaluation of th© maximum po««ble error in / as so obtained is more com¬ 
plicated. If t and Q/w are mmOl, Fig. 2 be used to eliminate inexactness 



PATJIi I. HICHABDa 

due to the approximation of (31) by (5) = (33). Otherwist*, this error may 
safely be assumed to be negligible (less than 0.02,’); (31) may be employed di¬ 
rectly, but this is laborious unless Q/w is small). The remaining errors, gi%’Pn 
by (6), may change depending on how Tj is assumed to vary. 'Fo make tliewc 
bounds as close as possible, it is best to choose 2\ = Mint 7’i. 7a) anil (lien let 



rm.2 Contours of//g-.(31),-^33^ 


•" ^ r,/r. to 

Q = h) and merely means that the “lock in” nbrn«^ Ts) « ^ and 

have an effect when t becomes greater than Q/t ^ 






raOllAHlLIT? OF COINCIDENCE 


21- 


3, Th® probaMlity fonctioo, (bir problsm has already been represented by 
the pulse waves trf Fig. 1. ITie gtarting phases ^ of the waves are random, 
and we «le4re the probability P of at Imt one overlap of duration at least L 
within a given time interval, Manifrastly P = 0 until time hence we shall 
give I the meaning alrpady awsigtied in (2). 

('uttHtlw any wilhinlerval <if width U. 'Ilie range of phases favorable to 
sttlififarkury minridenee on this interval is easily seen to be a rectangle with 
sidw tml, C4 twi m the phase plane , ^), By proper choice of the 
{arbit raryj »ero*phafte reference, the small rectangle favorable to coincidence on 
<0, f«l ran be made to fall in the lower left comer of the phase plane Wig. 4). 



As we allow the sub-interval (width Im) to advance in time, this small rectangle 
will sweep out along a 45“ line (Fig. 4); its horizontal displacement = vert. disp. 
is given by t as defined in (2). Since the phases must be,measured modulo,the 
periods, we must "switoh back" the strip whenever it be^s to leave the large 
reotani^e; 0 :S <^i :£ Ti, 0 < ^ 2«; this is illustrated in Fig. 5. 

The dwed probability Is then the area coveied at least once by the strip 
divided by (Tif*)* the total available area of the phase plane. • 

Using Fig, 4, one can easily show that, before the strip begins to overlap itself: 

(8) p«Po + w<, 

whw 1, Po, to are defined in (2). 

Afectangle wiSi opposite ados identified, as in Fig. 5, is topologically equwa- 
lent to a toms. ■ This ©ves a good geometric picture of the overlap phenonfet^a. 






22 


PATJIi I. BICHABDS 


The Strip winds diagotiaUy about the torus until eventually (m general jrfte 
several fuU bihuitsj it strikes sufficiently near its starting point to overlap itsel 
on one ikgb '^tien begins to fill the clunks between the previoi^ circuits, and 
this single bvefiap continues until the chinks are almost filled. The stnp then 
approaches its starting point from the side opposite to that on which single 
overlap occurred. Thereafter, only the center section of the strip is effective in 
increasing the area covered. This double overlap continues until the entire 
torus has been covered. A degenerate case is possible in which the strip, upon 
its first begins to retrace exactly its former path and the torus is never 

fully covered This corresponds to interlocking of the original waves of h ig. 1. 

A rigorous proof of the above statements may be constructed by using the 
fact that ^ach change in behavior can occur only at the starting point. In this 
iaani;^,ei:| easily shown that. (a) single and double overlap occur in that order. 






(b) the atrip area effective in covering changes only upon a change in the type of 
overlap, and (c) the two types of overlap must occur on opposite sides of the 
starting point. 

The facts (a, b, c) may then be used to derive the probability function. For 
the analytic analysis, it is best to return to the (</>i, <t>i) plane. Overlap of any 
type will first occur when the “unswitched-back” strip approaches sufficiently 
near a point (niTi, nzTi) where rii and rij are non-negative integers not both zero. 
The analysis is greatly shortened by noticing that the behavior is completely 
determined by the distance of the line </n = <f >2 from such points (even though 
the strip is not bentered on this line), while the width of the strip is (Fig. 4) 
wTiTtlV2. 

A slight fine-structure may arise in the probability function where it changes 
slope, depending on whether or not the leading corner of the moving rectangle 
strikes one pf the sides of the original small rectangle. These effects are small 
if ti/Ti are i^mall and will be neglected below by supposing the strip to be gen- 




PROHABILITV OF COINCIDENCK 


23 


<>mtpd by a lini* jrgmf'fit orifOtiwJ r«‘ri>entliciilarly to its path. The error arising 
from this prrmhiro consisttt (wntially in a delay or sidvance in the time at 
whirh P diatigw «lopf‘. It may be seen that the maximum effect representfi a 
delay of At The error intnxluoed is then less than Aiy/S multiplied 

by shat {wrlirm <tf ihr total width of the atrip which becomes ineffective due to 
tlm overlap considered. I’hc aum of tluw, effects must bc! loss than that given 
by iiaing the f«ta,l wirllh of the atrip; thia gives the maximum error w“TiTs/2. 

TUct nwiH« <»f the methcKl tmtlined are tlicn us follows. Single overlap occurs 
at t *" M where 

(9l s » + mTi), 

and (mi, mj| is that pair of non-negative integers not botli zero such that s is a 
minimum and 


..n. mi ^ 

(101 Pi « |™ - < K). 

Double tivcrlap occurs at t » d, where 

(11) d - KniTi H-njT,), 


and (f»i, Hi) is that pair of non-negative integers not both zero such that d is a 
minimum and the conditiorm 


(121 

arc satisfied. 
(13) 


i»L- 

ift 

Us 1 

■ nl 

< to, 



fe " 

- 

f Ml 


1 <0; 

\Tt 

TxP 


TU 

If we set 





pi =« 

Pi 4- 

Hi 

Ti 

th 

Ti 

- 10, 


the probability function is then 
Pb + lol 

(14) P ^ Po-h m+ (I - s)pi 

Pb sw ~h (d — i)pi + (t “ d)p$ 


for i ^ 8, 
for s < t < d, 
for d ^ I, 


where it is undierBtc>od that “ 1 if (14) gives i* > 1. 

The degenerate ease whore the waves interlock is given correctly by this for¬ 
malism, Nmncly, if the strip starts to retrace ite path exactly, then pi « 0 
and the second part of (12) shows that d does not mdst. Equation (14) then 
gives the correct iwdt: P ri»8 to the value A -f sw and never increases further, 


4. The method of smoothing. We have already discussed in section 1 the 
inadequacy of the formal mathematical solution (14) for purposes of practical 



.24 


PATJI- I EICHAHDS 


application Either mathematical analysis or intuitive consideration of inter¬ 
lock shows that the erratic behavior of P is due alniost entirely to small changes 
in the ratio Ti/Ti. As this ratio passes through certain rational values, pORsi- 
bilities of interlock appear and disappear. Consequently, we next alter (14) 
to a form in which the dependence on this ratio is more evident. 

We may, without loss of generality, assume: 

(15) Ti = 1, T,< 1. 

Also introduce the standard notation: 

(16) [x] = (largest integer < x ). 

It will then be seen that (10) and (12) may be thrown into the form;* 

(17) k = smallest positive integer such that pi = | fcc — f | < w {i = integer) ; 

(18) K = smallest positive integer such that | /sTe — 7 | < w and also 

{ke - i) [Kc - 7) < 0 (7 = integer), 

where either 



Now from (9) and (10), we note that s differs fromwil'i by at most w7'i7’>/2, 
wliile from (11) and (12), d differs from niTi by less than the same amount. 
Moreover, by the second half of (12), d is thereby made too small if $ has been 
made too large and vice versa, Hence the use of these approximations in (14) 
will contribute an error certainly less than w^TiTi/2. Adding the error clis- 
cuissed in section 3, the total introduced thus far cannot exceed w^TiTi . 

We thus use in the present notation s = k,d = K; (13) and (14) then become: 


(20) 

P 2 = Pi. + \ Ke — I \ — w 



(a) P = Po 4- wt, 

for t < fc 

(21) 

(b) P = Po 4" kw 4" (f — lt)pi , 

fork < i ^ K 


(c) P = Po 4- fctr 4- (K — k)pi 4- (< — K)p'i, 

for X ^ f 


where, as before, P - 1 if (21) gives a value greater than unity. Equations 
(17)-(21) are the formulation which will be used, with conditions (15), hence¬ 
, We wsh now to smooth P with respect to variations in e. The number- 
theoretic requirement (17) is extremely difficult to work with. For reasons of 
simplicity, then, we shall assume that e is the Only parameter which changes as 


fMlrS® ***°"Et the periods appear explicitly only in ( 19 ) hereafter, all the 

^ (This is evident it we reeall that w has the 
mensions of inverse tune ) Thus we are definitely assuming that T, = constant. 



PH0II.VT1H.ITY OF CCUN'f'lDr.Sf]; 


25 


Tt itt varied, 'riic* (‘trors which may arlao from thiK iwumpiion are treated at 
the end of swdion 5. 

From (19) or from the alwolute value signs in (17), (IH) it will Itc seen that 
all iwwible Ritiialions ariw, if e varies merely from wni to one-half. In oril(‘r 
that this should entail as little variation in 7s as possible, our conventions .slioiihl 
be ehr«‘n »« alreatly stated in (15). Kven under these circumstances, a maxi¬ 
mum variation of 339r ht Tt may la* recpiired to cover the range r » 0 to -b 

I*a|uation (211 cannot Ik" used directly without the interpretatkinal coiivontion 
there noted. This hwk t<i flifriculties of trc'alment wliich the author \vm unable 
to solve. The diffieulties may l>e avoifled hy I lit* following device, which ad¬ 
mittedly haa kws direct signilicanee than an averaged value for P. 

We enquire after the fraction f of the range of c over winch P Inis a value (at 
fixed (» less than some given value, Q -f- Pn. 'We may then say that, if a large 
number of trials each of iengtli I is marie, then in / of them, flu' prohahility of 
eoineidence. will b<‘ less than Q -f- Pi,. 

5. Calculation of /. The exeeplioual heliavior of F is that ciuisetl hy interloek 
pcwaihllitios. This corresponds to p, ^ 0 in (17). Tims the exceptional values 
of P center about the points o ~ t/k, where i and li arc relatively prime (other¬ 
wise, k would not Is* the smallest integer satisfying (17)). Moreover, by a 
standard theorem Hi, k < l/«*. 'PhuH the eritieal points form Ihe Farey wmes 
of order 1/w in the range (0, J). .Vhout eaeh Farey point, we may suspect tliat 
there will be an intr-rval over whieli k is eonstant, and that tlie entire range may 
thereby Ik; divided up into ranges of eonstant k. 

In thinking about the use of (17) in a typical ealculation, it is convenient to 
eliittinalc the integer t tiy representing multiples of c a series of points prep 
grMsing around and around a circle of unit cireumfcrenco. When e = i/k, the 
fcth multiple will (iifter i revolutions) coincide witli the origin; this and the 
earlier points, it is easily slmwn, will be distributed uniformly about the circle 
tvith a separation 1/k. 

As e moves away from the Farey point, k will, by definition (17), remain con¬ 
stant vmtil either (a) the point ke moves a distance greater than lo from the 
origin or (b) an earlier point moves to a distance less than w from the origin 
(Fig. 6). 

la^t (me) be that earlier point nearest (initially l/k from) the, origin and moving 
toward it as e, varies in a particular direction. Of course, 

(22) m < k. 

For each l''arey ix)int, there will bo two values of m; one for decreasing .c and 
one for inorojismg c. If we introduce the new variable: h = the absolute value 
of tbe change in e from the Farey poml i/k, then each point, ne, on the reference 
circle \vill move a distance nh, and (17) gives as the oonditions ior constant k 
(Fig. 7): 

(a) w > kh <= fu 

(23) 

(b) mh < (1/fc) — M. 



26 


PAUL I. RICHAEDB 


Thus we have divided the range (0, i) into smaU ranges where k 
fixed. The number of small ranges is roughly twice the number o y P 

withm each small range pi, K, pa still vary with e. The behavior of Pi is 



e =o.-»^ 
— e = o.tosr 
tv = 0.02 5' 
K =5‘-^?7' 


-6 = 0.375' 

-6=0.382 

WsO.OSV* 


^ Fia. 6 

already given in (23a); we shall find that we do not need pj, Using (18) and 
Fig. 7, it may easily be shown that: 

(24) K = m + jk + k, 
where 

(25) j + a = (1 - mkh — kw)/k%, j = [j], 0 < o < 1. 

From (23a), (24), (25), we obtain: 

(26) [K - fc)pi = 1 — few — ak^h (0 < a < 1). 

Having thus divided the range of e into small regions within each of which the 
number-theoretic requirements (17, 18) take a relatively simple form, we must 
now turn to the calculation of / = that fraction of the range e =■ (0, ^) over which 
P < Fo -b Q at fixed t. We shall specialize the further analysis to the case 
<3 < ^. This considerably shortens the discussion and yields essentially all the 
useful results of the more general inquiry. 

We first note from (21) that, since pg < pi < w (i.e. because of (4)), we have 
P < Pa + Q independently of e if t < Q/w 




TOmAS!UTir Of roiNcinEHCf- 


27 


(27,1 /" 1, for I < Q/r. 

Similar riM«oniii#i: rfiows on the other hand that, when i > Q/m, those regions 
with k ^ Q/» do not contribute W /. In the fallowing, we riiall there¬ 
fore employ! 

(281 k<Q/w < I, Q<h 

I'lquation CW iSwt we muat use either (2lb) or (21c); we ahall next 

show' that Wft do not need (21 n). The value of P whenever (2k) w applicable is 
certainly greater than {P« -f ^ +• (K — k)pi). From (20), this value a equal to 
+ 1 -■ a^h). Now from (28), ic < l/2kt whence by (23a) h < l/2fc® < l/2aA* 
'ihua (Ps -H 1 ~ o^A) > Pfl -f J > Pa + 0, and consequently 
(2te^ until P > Pe + Q. (This means merely that the double 

overlap diacuia^ in section 3 cannot occur until at least half the torus is covered.) 
Ajjcordingly, we can confine our attention entirely to (21h) in any further dis¬ 
cussion of /. 



Substituting for pi from (23) and recalling that (1 — k) is positive (by (28)), 
we find from (21b) that the condition P < Pt+ Q becomes: 


(29) 


h < 


Q — he 
k(l ~ k) ■ 


However, h is subject also to the restrictions (23), which insure that wo do not 
stray from the small region where k is constant. We assert that (29) implies 
(23) and may therefore be used as the final expression of the requirement 
P < Po + <2. 

To prove this, note first that (29) and (28) immediately give A < w/k, which 
is (23a). Secondly, (28) implies 1/k > 2w so that, using (23a) and (22); 
il/k) ~ w > w > kh > mh, which is (23b). 

Thus wo arrive at the result that f receives contributions only from those 
elementary regions whore k satisfies (28) and that the contribution of each such 
region is governed by (29). 

Since the variable h was defined as the absolute value of the change of e from 
the Farey point i/k, each Farey point (satisfying (28)) contributes an amount 
equal to twice* the right-hand side of (29). Since this amount is independent 


’ This is not true of tho Farey points 0 and the ends of the range of e, but the terms 
fc — 1,2 In ( 81 ) correctly account for those contributions since ^(1) »> 1/1(2) »■ 1. 



28 


PAUL I. RICHAHDS 


of i, we may immediately sum over all Farey points ifh with fixed A'. There 
are such points^ in the range (0, §), where Euler’s fimction is defined by; 

(30) 4>{k) = the number of integers < h and relatively prime to A. 

(Note that ^{h) is even for A > 3 since if A and i have no common divisor > 1, 
neither do A and A — i.) 

Thus, summing over all these contributions and dividing by the length of the 
total range: 

Regarding error m (31) due to the inaccuracy of (21), note that this can enter 
only when we set P = Po + <3 in deriving (29). Actually the difference betnwn 
(21b) and the correct value of P will change as e is changed so that there is con* 
siderable possibility that these effects will cancel out in (31). (In fact, a de¬ 
tailed study shows that the error in (21b) assumes opposite sign.s as c imrics in 
opposite duections from any given Farey point,) In any case, because (31) is 
monotone m Q, the error in (31) can be no greater than that found by siih.sli- 
tuting Q rb lo'^TiTi for Q. Taldng account also of the variation of Pa with 7s, 
the same argument establishes the "Q-dependenco” yf (0) given in section 2. 

Finally, we investigate the error due to change in w wi th Tj. If B) is t lie maxi¬ 
mum value of w, Farey points with A < Q/w are certain to conlrihiKr- (o /, and 
this contribution ivill be at least as great as (Q - kW)/kit - A) so that / > /(ffl). 
On the other hand, if ^ is the minimum value of le, Farey points with A > Q/w 
cannot possibly contribute to /, and the remaining points can contribute no 

more than (Q — Aw)/A(< — A) so that / < /(le). Hence we arrive at the final 
statement (6) in section 2 

6. Approximations for /. Computational diflBculties in the use of (31) sug¬ 
gested approximating it by a more readily computed expression. By a standard 
theorem [1, p. 2(iG]: 

(32) 4,(k) = OA/ir' 

We may then approximate (31) by: 

/=1.216 r“’"‘9^dA 

Ji l~k 

= 1.216 Q fl -f ~ ^ W ~ Q ~ ^w \ 

' Q Lv) ~ )' 

If Q/w is large compared to h (recall i > Q/,,), this becomes veiy nearly 

(3.) 1- 

De.p,te th. .avto denvatm of (33), ite agr«».6nt'™tl, (31) fe rsaarkably 



PKOBABILITY OF COINCIDENCE 


29 


close. Fig. 2 allows a perfectly general comparison of (31) and (33), where the 
agreement will be seen to be fairly good even for I and Q/w of the order of 4 or 5. 
Note also that (33) nearly always gives a value of / that is too large, 

For completeness, we may repeat (27). 

(34) / =» 1 for i < Q/u). 

Note that only the dimensionless quantities iw, Q enter into (33, 34) which are 
therefore independent of the normalization (15). 

REFERENCE 

ID G. H. llAnnY and E. M. Wright, An Introduction to the Theory of Numbers, Clarendon 
Press, Oxford, 19 S 8 , p, 80 . 



NONPAKAMETRIC ESTIMATION, IH. STATISTICALLY EQUIVALENT 
BLOCKS AND MULTIVARIATE TOLERANCE 
REGIONS-THE DISCONTINUOUS CASE 

By John W. Tuket 
Princeton University 

1. Sunvnary, In Paper H of this series [2, 1947] it \v‘as shown that if n 
functions and a sample of n were used to divide the population space into n + 1 
blocks in a particular way, and if the joint cumulative of the functions were mitin- 
MOMS, then the n + 1 fractions of the population, corresponding to the u -I- 1 
blocks, were distributed symmetrically and simply. 

In Paper I of this series [1,1945] it was shown that the one-dimensional theory 
of tolerance regions could be extended to the discontinuous case, if cquaHtiw were 
replaced by inequalities. 

In this paper the results of Paper II will be extended to the discontinuous case 
with the same weakening of the conclusion. The devices involved are more com¬ 
plex, hut the nature of the results is the same (See Section 6). 

As a tool, it is shown that any n-variate distribution can be represented in 
terms of an n-variate distribution with a continuous joint cumulative (in fact, 
with uniform univariate marginals), where each variate of the given distribution 
is a different monotone function of the corresponding variate from the continuous 
distribution, 


2. Introduction. The importance of extending the simple resulta of the 
continuous case to the more complex results of the discontinuous case may not 
be clear at first thought. Yet all the data with which the statistician actually 
worb comes from discontinuous distribuUons. Often these distributions ate vary 
fine-grained-the distributions of the number of eggs laid by codfish and of the 
measured wavelengths of a spectral line (measured in O.OOOOOl 1) do not have 
large concentrated probabilities, but all their probability is concentrated at dis¬ 
crete points. Insofar as the considerations of the'theoretical statistician apply 
to the data as received rather than to the ‘'date" of a more or less imaginaiy 
model, these considerations apply to data with a discrete distribution. When 
bs theories axe ^ected on a basis of a probability density function, or even a 

inrulZatir^ extrapolation from theory to practice. 

Lte ie^a^d “Mathematical statistician to study dis- 

effects whit rtifh K effects and the pleasant small 

sooner or hjface lie ^«crefe dm, and must 

discontinuous case, we must face two problems' (we 
assume that the reader is familiar noth Paper 11 [2]) 

(1) What to do about “ties”? 


30 



NtJ^-PARAMETRIC KSTIMATIDN III 


31 


(2) Finite probabilities associated ivith cuts. 

The first of these is peculiar to tlio multivariate situation and can be easily ex¬ 
plained by an example. Consider the three points in the plane with coordinates 
fl, 9), (3, 9) and (2, 6). l^et the first two functions be y and x, then the pro¬ 
cedure of Bcction 4 of Paper II [2] is not unique—two possibilities arise: 

Allmiativc A. (1,0) is selected as having the largest y, and (3, 9) as having the 
largest x among the remaining (two) points,hence Si = {ix,y)\y > 9), Sa = 

I (X, l/)|l/ <d,x> 3}, Sjp - ((x, y)ly <9,x < 3 J. 

AUrmaiinr B. (3, 9) is selected as having the largest x, and (2, 6) as having the 
largest x among the remaining (two) points, hence Si = \(jx,y)\y > 9), S'a = 
)(x,yy}j < 9,jE > 2), .S’J .4 = \{x,y)\y < 9, a; < 2]. 

Xotice that Sa ^ Sa. The procedure is not uniiiuc. In the continuous case, 
tics happen with probabilily zero, hence, their con.‘-etiuenccs could be neglected. 
This is now no longer the ca.se. 

This difficulty i.s solved by using more functions and the idea of lexicographical 
(like a dicfionary 1) ordering. In the simplest case, wc add no new functions and 
proceed as follows: If there is a unique i for which tpiiw.) is maximal, select it. 
Otherwise look among the v\ for which ^i(w,) i.s maximal—look at the valuc.s of 
Vafie,). If there i.s u unique .such i for which is maximal, select it. If not, 
go cm tet v"a(i''i) ■ ■ • • d'his procedure leads to a specific i unless ~ phiwi,) 

for h and some ,/ ^ Hut in thi.s case it docs not mailer whether j or k is 
selected, the st't of »i-lupk's (sei(u>,), vs(io,), • • • , cPm(ui,)) remaining will be. the 
same, altliough tho iiulicc's i will not. Hut the indices play no role in the actual 
construction. 

As an example, cemsider the following 20 four-letter words as a sample and lot 
there be four functions—ip, being the negative of the position in the alphabet of 
the f-th letter of the worcl. (d'hus a > h > c > • • • > z.) 

Sample', meet, west, made, gone, come, hack, said, that, maid, well, wdth, with, 
just, week, very, near, edge, thi.s, hast, have. {The Law of llw. Three Just Men, 
Edgar Wallace, pp. IfiO-lOO). 

Selections', back, made, naar, (gone, come, edge, have. The fourth selection to 
be made at random among these four.) The inference.? which can bo made about 
the four-letter words in Edgar Wallace’s w-riting vocabulary are left to tlie reader. 

We have just given one mle for breaking ties, one which chooses Alternative B 
in our example. But we might prefer a rule which chooses Alternative A. To 
get more generality, wo have only to take M functions, M > m, and let <ppm , 
■ • ■ > J (where, we may suppoae. p(l) = 1 without loss of generality) 
play the role just taken by ipj, ipj . Thus if the maximum of ipiiw) is not 

unique proceed to ips(u)), thence to vaCm), • • •, thence to ■i»m(ie). For the second 
block, start with 9 J„( 3 ), then ¥>pcs)+i, ¥>,(» i-a, •• • , v>m ■ And so on. The choice 

Vi(*» p) ’= V, 

<pi{x,v) = —X, 



32 


JOHN W. TUKEY 


^= 3 ( 3 ;, y) = xe'', 

<Piix, y) = X, 

Piix, y) = y\ 

with 2)(1) = 1 and p(2) = 4, leads to Alternative A above. (Note that v?* is a 
dummy in the sense that it is never used.) The problem o£ ties, which was a 
problem in uniqueness of construction, is thus dealt with. 

Next we must deal with the cuts. When we made Si, , S 2 and in Alteraative 

A, we omitted some points, namely 

Ti = {ix,y)\y = 9], and Tj = [(a;, j/) | p < 9, x == 2}. 

In the continuous case this did not matter, since these sets had probability zero 
and could be avoided. Here they cannot, and we shall have to consider a family 
of blocks (in the wide sense) as consisting of the blocks S and the cuts T. The 
solution of the univariate case in Paper I [1] shows us that rvhat we must expect 
is that: 


Pr { coverage S, + !r,_i + Ti > f} > Pr { coverage of one 

continuous-case block > «) > Pr ( coverage Si > i|. 

That is, if we want a certain set of blocks to cover (together) at lea^L a certain 
amount with a certain probability we must add the adjoining cuts; and if we 
want a certain set of blocks to cover almost a certain amount with a certain prob¬ 
ability we may add only these cuts which do not adjoin blocks not in our set. 
By introducing the cuts explicitly, we solve the second problem, 

In order to reduce the size of the cuts, our detailed definitions will differ in 
detail from those which we have used so far. In the example, where the functions 
leadmg to Alternative A are used; we place in Si not only the points with y > 9, 
but also those with 1 / = 9 and -x > -1; we place in St not only the points with 
1/ < 9 and X > 3 and the points with 1 / < 9, x = 3, y' > 49, but also those with 

1; - 9 and-x < -1. Proceeding in this way, we reduce Tito the point x -= 1 
y = 9 and r, to the point x = Z,y = 9 . This reduction can only diminish the 
probabdity associated with the cuts, but we cannot be sure that it will reduce it 
to zero. 

case, wtare the probebiBty that all function, shall tie 
ogether 18 mto, do -we return to the amphcity of the continuous cage. This case 
» ,««-t.maI because it doee not arise rth discrete probabililie,, and “ “ 
servations always mvolve discrete probabilities. 

Th^3ofTmt-^V''''^*'’r "'^thoda. 

me prool of the mam theorems depends on two facts: 

(1) a representation theorem, (5.3), and 

(2) a lemma, (O) which shows that m functions would be enough if (il the 
distribution were fixed, and (ii) cases of probability zero ne^d.^^L 



NON-PARAMETRIC ESTIMATION III 


33 


representatioa theorem has been outlined in the summary. It is analogous 
to, but a definite extension of the one used in Paper I [ 1 ]. It seems to be new in 
statement, though not in thought—it will surprise few probability theorists. The 
novel element is the monotonicity of the functions, which is utterly essential for 
our purposes. 

The lemma allows us to reduce the general case to the case of no e.xtra func¬ 
tions, where the reduction must be made differently for each underlying distri¬ 
bution. The reduced functions are then represented by the representation 
theorem and the results of Paper II [ 2 ] are taken over. The results are stated 
in a form independent of the underlying distribution and the particular repre¬ 
sentation, hence they apply in general 

The last paragraph stresses the principle common to Paper I [ 1 ] and this paper. 
It is natural to call it the “iceberg principle,” and to sketch it as follows: “We 
have some information about the visible one-ninth of the iceberg, and we want 
to conclude something about this visible part. If we can imagine another eight- 
ninths, consistent with the part we know, and if using that we can prove some¬ 
thing expre.ssed solely in terms of the visible part, then this is the required proof. 
(The only essential is to be able to match every visilolc part.) ” Both the reduced 
functions (which depend on the underlying distribution) and the uniform vari- 
able.s u.scd to represent them are part of the invisible eighi-ninths which “could 
be there.’' 

3. Terminology and Notation. In general we use Ihe lerminology and nola- 
tion of Paper II [ 2 ], and we shall continue to assume that all functions concerned 
in the argument are measurable. 

Given two finite sequences of the same length, we write (ax, ih , • ‘ , a„) > 

■ , ^>m) if any of the following hold: 

ai hi, 

ax = hx, and 04 > lij, 

ax ~ hx, Oi = hi, and 03 > , 

• f • 

0 , = Cor i < m, and a,n > . 

This is the lexicographical order referred to above. (Wo interpret (ai, oj, * ■ ■, 
am) < {h,h, , hm) to mean ( 61 , h. ;•••,()„)> (ai, 04 ,••• , Om) and = to 

mean identity.) 

3.1 Demnition: Given a segticncc of real-valued functions <px,‘(>i) ■ ■ • ,‘Pu and a 
seguence of starling indices p(l), p( 2 ), • ■ • , p(m), (which we shall often refer to, 
briefly, as an m-systom of functions, ifix,<fii, * • - , ipw, without explicitly mention¬ 
ing the starting indices), the functions , $3 , • • ■ , ore defined as followst 

(3.2) $*(w) = ivfpwCw), <ppikHii.w), • ■ • , <pm{w)}, 



34 


JOHN W TUKEY 


the values of $4 being sequences ojM — p(h) + 1 numbers. (In these terms, the rule 
for tie-breaking already explained becomes "select an i for which is max¬ 

imal (in the sense of lexicographical ordering)’’.) 

4. The blocks and cuts determined by n points. 4. Definition: (Hmi 
an m-system of funchons <pi ,<pi, • ■ • > <pm nnd n points Wi, Wa, • • * , U)n , (w ^ n) 
the corresponding blocks and cuts are given by the follounng procedure, (the 4? 8 are 
defined in 3.1) hrst i(l) is selected to maximize 'bi(u;,), when 

Si =. {id 1 $i(id) > $i(m*ci)) 1. 

Ti = {W |$i(id) = $l(lD,<l))}. 

Next, i{2) is selected 9 ^ i(l) and to maximize among such i, when 

Si = {W |4>i(tD) < $l(w,a)), ^iiw) > ^ 2 (lD,(a))}, 

Ti = {i«|4>i(id) < $l(lD«i)), $2 (id) = ^2(tDl(2))). 

.(the construction is perfectly analo¬ 
gous to II-4.1) 

Smi „+1 = {id 1 ^liw) < $fc(tD,(*)), k = 1,2, ■■ , m). 

4 2 Definition: If m = n, then S„|„+i is also denoted by Sn+i. 

If m> n, then only , $ 2 , • , are used and Sn|n+i is also denolcd by S„ v j. 

We denote by X a subset (possibly none, possibly all) of the indices 1,2, • • • , 
m and m\n + 1 or, in case m >noi the indices 1, 2, • • • , n -{- 1. 

4.3 Definition. The block-group B>, consists of the union of all S, with i in \ 
and all T, with both i and i 1 in X(m -|- 1 means m | n 1). 

The closed block-group Sx consists of the union of all S, with i in X and all T, 
with either i or i -[■ 1 m X 

Given any set we define its coverage as the proportion of the population falling 
into it (here the underlying probability distribution appears for the first time in 
this section), and we use 

4.4 Definition: The coverage of is denoted by C(X) and that of by G{\). 

Thus, given a family of functions ip and n points w, the space of the w is divided 

into blocks and cuts, these are joined together into block-groups, and these 
block-groups have coverages. Thus, if the family of functions is fixed, the n 
points determine these coverages, and, if the points are chance points, the cover¬ 
ages are chance numbers. 

6. Statement of results. Having discussed the construction, we can now 
state the results. 

(^1) Theorem . Let ipi, <P 2 , • • • , Px, be any m-system of functions and 
letWi,Wi,-- - ,W„, -where m < n, be a sample from any distribution, let the 
blocks, cuts, block-groups and coverages be formed, as described above, using the 




NON-PARAMErKIC ESTIMATIOX III 


35 


sa»R'’'(unkno’wn) distribution for forming the courages, Tfien, if cxi, ,■■■, up 

are any set of X’s (each X is a set of indicesl), 

Pr iC(ai) < Oi, C(«a) < oa, • • • , 0(ak) > Ok , ■ • C(ap) > Up] 

> Pr {i(ai) < Oi, tiaf) < Ot, , i(ait) > • , t(ap) > aj,l, 

teherc <(X) = for iin X, - tmn + tn+i, and /i, , • ■ • , have 

a uniform, distribution on the baryceniric simplex. (Compare Theorem Am|„n of 
Paper 11 [2].) 

In particular, 

Pr ((7(i) < al > 1,(1, n) > Pr < a}, t = 1, 2, • • • , m, 

where 1,(1, n) is the incomplete Beta-function. 

(5.2) Theorem . Let pi, <pi, ■ - ■ , (pu be any n-syslem of functions and 
TTi, TP 2 , • • ■ , Wr, be a sample from any distribution. Then 

Pr (Cfai) < ai, C(af] < Oa, • • •, C(«*) > a*, • • ■ , G(ap) > Op] 

> Pr {J(ai) < fli, t(at) < Oi, • • • , t(ak) > a*, ■ • • , t(ap) > Op], 

where iCK) = Si,' for i in X and ii, 4, • • • , 4+1 have a uniform dislnbution on the 
harijcentric simplex. In particular, 

Pr {C(i) < 0 ) > 1,(1, n) > Pr {C?(f) < a), f = 1, 2, • • ■ , n + 1. 

For convenience of reference, we also state the representation theorem as: 

(5.3) Tieorem C. Let ATi , Xt, ,X„ have any joint n-variale distribution. 
Then there exist (real) functions gi, gt , • • ■ , g, and a joint distribution for 
Ui ,Ui, • ‘ , Un such that, 

(i) the marginal distribution of each U, is uniform on [0, 1], 

(ii) each function g is non-decreasing, 

(iii) the distribution of gi(Ui), < 72 ( 1 / 2 ), ••• , gn(U„) is identical with that of 
Al , A 2 , ■ ■ ■ , An ■ 

6 . The functions V'- The aim of this section is to prove 
(S.l) Lemma. Given any m-system of functions vi, (/>i, • - ■ , vn , there exist 
real functions , • • ■ , V'jw such that, if TFi, Wt, ■ , IPn arc a sample from 

the distribution concerned: 

(6.2) Pr {^<(Wf) = ^i(Wk), hut \p,^.h(Wf) 7 ^ for some X > 0) =0. 

(0.3) Pr l'l>i(Fy) has a different relation to ^i(Wk) ihanihatof\p{(W,) tof/{(Wk)} - 0, 
where by relation is meant >, =, or <, 

The ifi will depend on the underlying probability distribution, Thus they ore 
useful in the proof, but could not replace the $iin the statement of the theorems. 

(6.4) Lemma. Let ^(w) have its values in a totally ordered set, (i.e. always either 
$1 < $ 2 , $1 = 'I»a or §1 > '^ 2 ) and let TP have a distribution. Consider the function 


i/(w) = Pr {#(TP) < $(w)). 



36 


JOHN W. THKEY 


LetWi,Wi,-- - ,W„he a sample from the same distnbulion, then, with probcMUy 
one, the relation (<, =, or >) between f) and $(TF*) is the same as that be^ 
tween i'iW,) and ypiWk)- 

If then < ^{w^, if 4'iw,) < ^(wk), then $(«)/) < 'J(wi). 

These follow directly from the definition. To prove the lemma, then, wc must 
show that 

(i) iiwj) = V'W hut $(w,) < ^(wk) occurs with probability zero. 

We may clearly assume that the totally ordered set is complete, and that, in 
particular, it contains the symbols — « and + «>. Consider the real function of 
an abstract variable, 

F{s) = Pr 1$(W) < si. 

It IS a monotone function, with F{— oo) = 0 and P(+ «>) = 1. We can there¬ 
fore, given t > 0, select elements — « = So < Si < sj < ■ * • <s* = -f- oo such 
that 

0 < F{s,^i) - F(si + 0) < e, 

If (i) occurs, then f (lu,) and #(«)*) belong either to the same open interval 
(si, s.+i) or one belongs to an open interval and the other is its upper endpoint. 
The probability of either of these happening is at moat 

{F{s,+{) - Fisi -b 0)}* -b n{P(s.+0 - F{st + 0)} {F(sm + 0) - ?(«<+,)}. 

Summing this over all intervals yields an estimate of 

Max - F{s, -b 0)1 = *. 

Since this goes to zero, the lemma is established. 

We turn now to the proof of (6.1). The system of functions <pi, • , ifijn 

define the $i, f-j, • ■ • , according to Section 3. These define , • • • , 

according to lemma (6.4) just proved. Applying this m times proves (6.3). 
Recalling that = 4>.(w*) implies we see that (6.3) 

implies (6.2). 

7. The notation P(a: + X-0), All practitioners of analysis are familiar with 
F{x -b 0) and F{x — 0), defined by 

F{x ± 0) = lim F{x ± h). 

ftiO 

We now generalize this formal notation to 

(7.1) Fix + X.O) = Fix -b 0) + Fix - 0), 

where we will, in our immediate applications, need only X’s between -1 and -b 1 



NON-PA.HAMBTEIC ESTIMATION III 


37 


(althougli the definition applies in general). Notice, for example, that 

Fix - 0) < Fix + \-0) < Fix + 0), for -1 < X < 1, 
that if F is continuous at x, 

Fix + X-0) = Fix dz 0) = Fix), 
that the condition for F to be normalized is 

Fix + 0-0) = Fix). 

A similar definition is made for functions of two variables, namely 
Fix + \-0,y +,,-0) = lili* Fix + X-0, ?/ + 0) + Fix + \0,y~0) 

= ^-P^Fix + 0, 2 / + M-0) + Fix ~0,y + m-0), 

where the two right-liaud sides are equal if, as is the case for cumulativos, all 
doubly one-sided limits exist. 

If FCxi, xa) is the joint cumulative of two variates, then, when all ordinates 
and abscissas involved are ordinates and abscissas of continuity, 

Pr (a < a: < b, c < y < d} = F(b, d) - Fib, c) - F(a, d) -h P(o, c) > 0. 

Passing to the limit m assorted ways, and taking linear combinations gives 

Fib 4- ju-0, d 4- p'O) — Fib 4- yO, c 4" >"0) 

(7.2) 

- Fia 4- X-0, d 4- P-0) 4- Fia 4- X-0, b + p-0) ^ 0, 

for — «> < a, b, c, d < 4- “ and —1 < \, n, v, p < 1. This will be of use 

shortly. 

8. The representation theorem. It was shoivn in Paper I [1] of this series, 
that the uniform distribution on [0,1] could serve as the prototype of any variate 
—that is, that given a distribution, there is a monotone function g, so that giU) 
has the given distribution, where U has the uniform distribution on [0, 1]. 
(In Paper I, U was denoted by X*). 

In the notation of the last section, tliere is a function X (w), with 1 X (u) | :^ 1, 
so that 

(8.1) Figiu) 4- X(w)-0) = u, 

for all -u. (We may, and shall, require that giu) = — «, for -u ^ 0, and giu) 

~ 4-« for n > 1). It is easy to see that giu) is unique except on a set of 

probability zero and that X(w) i.s unique (and in fact linear) on each open interval 
which contains no value of Fix). 

Each cumulative Fix), then serves to define giu) and X(ii) by the equation 



38 


JOHN W. TUKET 


(8.1). Two or more independent vai’iates can be thrown back on a set of inde¬ 
pendent uniform variates by applying this process to their cumulatives separately. 

Our present problem is to prove Theorem C (5.3), which applies to variates 
Xi, Xj, • • , which need not be independent. Let Fi{xd be the (marginal 
cumulative of X,, and use (8 1) to define Sfi(n,) and Xf(Ui). Tlicn define the 
joint distribution of XJi, Ui, •" , Un by 

G{ui, Wj, ■ ■ ‘ , Wn) = F(Si{ui) + Xi('iii)’0, , (7rt(nn) + Xi,(ttH)'0), 

where F{xi , % , • • • , is the joint cumulative of the Xi, Xj, ■ ■ • , X''^ . 

We shall verify that this is the desired distribution in the case n = 2, leaving 
the general case to the reader. Consider G{ui , + ») = G{ui, 1) = F(( 7 i(uj) 
+ X, (lii)-O, + “). This is a cumulative, and so is (?(+», ttg). In fact, 
using (8.1) they are each the uniform cumulative 

0, M < 0, 

GCn) = ■ u,0 < u < 1, 

1, 1 < M. 

By (7.2) all second differences are positive, and hence G(ui , Uj) is a joint cumu> 
lative Since its marginals are uniform, it is continuous. 

Finally, 

PrlSiiUi) < Si,gi(Ui) < Sa) = G(F(si - 0, + »), F(+ oo, gj — 0)) 

= F(Si - 0, 82 - 0), 

since gi(ui) < Si is equivalent to Wi < F(st - 0, + co) and giM < Si is equiva¬ 
lent to M 2 < F(-l- 00 , S 2 — 0). Thus gi(Ui) and gtiUi), have the given bivariate 
distribution. 

9. Proof of main theorems. We come now to the proof of Theorems Am|„+r 
and Bn+i, and we begm with Am|„ 4 i. According to Lemma (6.1), the various 
indices, t(l), i(2), ... , i(m) selected to determine the blocks will be the same, 
excluding cases of probabihty zero, whether the or the are used. Consider 
the first block, which takes the forms: 

s; = {Wl#i(F) >$2(w,(i,)). 
s'l = [W li;'i(TF) > fi(wi(i))}. 

Another application of Lemma (6.1) shows that these sets differ by a set of 
probability zero, and hence their coverages are identical. It will thus suffice to 
prove theorem A„i„ 4 i for a fixed underlying distribution and the corresponding 

According to Theorem C (5.3), the m-variate distribution of the iiiW) can be 
represented in terms of uniformly distnbuted variates I7i, ■ • •, and monotone 
unc ions gi{ i), ••• , gn{Um)- Now I7i, [/*, ••> , 17m have a continuous joint 



NON-PARAMETRIC ESTIMATION III 


39 


cumulative, so that theorem A,„|n4,i applies to a sample of n drawn from this m- 
variate population, with the coordinates themselves as the m functions, We shall 
denote the coordinates of the f-th element of this sample by Wi(i), ,Um (^). 
Consider the first block, 


Itsimage,fifffl = , 

contains 




and is contained in the union of Si and Ti , where 


Ti = Kffi' 


Vi 




Thus the conclusions of Theorem hold for ;Sf, 2 '? , • ■ •, Sl, , Tl , )Sw|n+i. 

Now while Theorem Ai|„+i mentions the underlying IF's implicitly, careful 
study shows that they are not really involved; only the joint distribution of the 
(p,, which in our present case are the , matters. Since tins is the same for the 
f (II] and the (/,([/], Theorem A^^n must hold for the and the theorem 
is proved, 

Theorem Kn 1 ^ n special case of Theorem A^inn. 


ft* m* 


111 II, SciiEKF^i AND il, W. Tukoy, "Nonpaiainetric Kstiiiiatioii I. Validation ol order 
statiBticB," danoti' of Mh Slal,, Vol. 10 (1045), pp. 187-11)2 (Also cited aa 
Paper I). 

[2] J. W. Tukey, "Nonparaiiietric Estimation 11 Statistically equivalent blocks and mul¬ 
tivariate tolerance regiuiis The continuous case,” AnmU of Malh, Slat,, Vol. 
18 (1947), pp. 629-539 (Also cited as Paper II), 



ASYMPTOTIC PROPERTIES OF THE MAXIMUM LIKELIHOOD 
ESTIMATE OF AN UNKNOWN PARAMETER OF A DISCRETE 
STOCHASTIC PROCESS 

By Abraham Wald 

Columbia Umversiiy 

Summary. Asymptotic properties of maximum likelihood estimate's have 
been studied so far mainly in the case of independent observations, In this 
paper the case of stochastically dependent observations is considered. It is 
shown that under certain restrictions on the joint probability distribution of the 
observations the maximum likelihood equation has at least one root which is a 
consistent estimate of the parameter 6 to be estimated. Furthermore, any root 
of the Tna,ximiim likelihood equation which is a consistent estimate of $ is shown 
to be asymptotically efficient. Since the maximum likelihood estimate is always 
a root of the maximum likelihood equation, consistency of the maximum likoli* 
hood estimate implies its asymptotic efficiency. 

1. Introduction. Let {X,), (i = 1, 2, - , ad. iiif.), be a sequence of chance 
variables. It is assumed that for any positive integral value n the first n chance 
variables Zi, • • , X„ admit a joint probability density function , ■ ■ ■ > 
iCn, d) involving an unknown parameter d. The consistency relations 

f+*> 

(1*1) I Pn+l(*ri ) ' * * , ^n4l f ^n+1 “ Pnip'l > * * * » 

are assumed to hold 

In what follows, for any chance variable u the symbol E{u\d) will denote the 
expected value of u when 6 is the true pai’ameter value. 

Let t„{xi , • • • , j„) be an unbiassed estimate of 0. Cram6r [1] and Rao [2] 
have shown that under some weak regularity conditions on the distribution 
function Pn(xi, • ■ ■ ,x„, 6), the variance of t„ cannot fall short of the value 

J_ 1 

(1 2 ) c„( 0 ) ^ ^ log I • 

Thus, for any unbiassed estimate i„ the variate - 6) has mean value 

zero and variance ^1. An estimate 4 is called efficient if VcMiL - 0) has 
mean value zero and variance 1. 

• ®®‘Wence {t„] , (n = 1, 2, ■■ ,ad. inf.), of estimates is said to be asymptot- 
icaUy efficient if the mean of 'v/c„(0) (L - 0) is zero and the variance of 

0) is 1 in the limit as n oo. In the literature usually the additional re¬ 
quirement is made that the limiting distribution of - 0) be normal. 

40 



ASYMPTOTIC PHOPERTIES 


41 


To make a distinction between the two cases when the condition concerning the 
limiting distribution of Vc„(fl) {U — 6) is fulfilled or not, we shall say that (<„) 
is asymptotically efficient in the wide sense if it satisfies the conditions concern¬ 
ing the mean and the v ariance of y/cniO) (4 — 6) If, in addition, the limiting 
distribution of ■\/c„(0) (i„ — 6) is normal, we sliall say that (in) is asymptot¬ 
ically efficient in the strict sense. Clearly, if {/„) is asymptotically efficient in 
the strict sense, it is also asymptotically efficient in the wide sense. 

A word of clarification is needed as to the meaning of the conditions concern¬ 
ing the mean and variance of ■\/cn(^) (i« — 6) One interpretation Avould be 
that the requirement is that 


(1.3) 

lim A[Vc„(e) (in — fi) 1 fi] = 0 

n^ieo 

and 


(1.4) 

lim E[Cn(6) (tn — ef 1 fi] = 1. 


tlnoO 


Another inteipretation would be that the requirement is that the limiting dis¬ 
tribution of V Cn(d) (In — 8), provided that the limit distiibution exists as «, 

should have zero mean and unit variance. These two interpretations are cer¬ 
tainly not cc|uivalcnt. It seems to the author that the mean and variance of 
the limiting distribution is more relevant than the limits of the mean and the 
variance. We shall, therefore, adopt the following definition of asymptotic 
efficiency: 

DcfiniLion: A sequence li„} of estimates i.9 said to be asymptotically efficient 
in the wide sense if a sequence {u„) , (n = 1, 2 , • • ■ , ad. inf.), of chance variables 
exists such that 

(1.5) lira E(un 1 fl) = 0, lim E(ul ) 0) = 1 

tiwoo 

and 

(1*6) Vcn{e){tn - 9) - u„ 

converges stocha^ically to zero s,b n ^ co. If, in addition, the limiting dis¬ 
tribution of •\/^7(fl) (U — 8) exists and is normal, {(n) is said to be asymptotically 
efficient in the strict sense, 

The reason that a sequence (u,,) of chance vari ables is considered in the above 
definition, instead of the limiting distribution of \/c„(e) (U - 6), is that the exist¬ 
ence of a limiting distribution of V c„{8) (<« — 6) is not postulated. If a limiting 
distribution of V^) (<n --- 8) exists and if this limiting distribution has zero 
mean and unit variance, a sequence (ub) of chance variables satisfying the con¬ 
ditions (1.6) and (1.6) alwa ys exists. This can be seen as follows: Let T„ denote 
the chance variable {U — 8) and let Fr,{i) = prob. (T* < t}. If a limit- 



42 


ABRAHAM WALD 


ing distribution of T,, exists and if this limiting distribution has zero mean and 
unit variance, then 

(1.7) lim lim f t clF„{l) = 0 and lim ] lim f ^ dF„{l) = 1. 

a-Beo _7i=eo J—a ^ LtimW v—a J 

From (1.7) it follows that theie exists a sequence {onl, (a = 1, 2, • • ■ , ad. inf.), 
of positive values such that the following conditions are fulfilled: 

tdFn{t) = 0; lim | f dFn{l) = 1; limProb \ \Tn\ > a«) » 0. 

an n=aoo •'—an nmn 

Let be a chance variable which is equal to Tn whenever | Pn | ^ , and wjual 

to zero otherwise. Clearly, the sequence {«„! will satisfy condiUons (1..'5) and 

(1 6 ). 

In the following section we shall formulate some assumptions concerning tho 
probability density function p„(a;i, • • , a;„ , 0). It will thou lie shown in sec¬ 
tion 3 that there exists a root of the maximum likelihood eciuatioti 

(1 9) = 0 

as 

which is asymptotically efficient at least in the wide sense. 


2. Assumptions concerning the probability density pn(xi t , $). Wc 

shall assume that there exists a finite non-degenerate interval A on the O-oxis 
such that the following conditions hold: 

Condition 1. The derivatives , (i = 1,2,3), exist for all 6 in A and for all 

samples (xi, ■ ■ ■ , a:„) except perhaps for a set of measure zero. We have fur¬ 
thermore, 



Condition For any 5 in A we have lim c„(6) = <x>. 

n«oo 

Condition 3. For any 8 in A the standard deviation of divided by the 

expected value of (both computed under the assumption that 6 is tme) 

converges to zero as n — » oo. 


Condition 4 There exists a positive 3 such that for any 6 in A the expression 
(2.2) -L^/Tlmb ^ogyn(x>,---,a;„,e0 |[ 1 

Cni6) L f 7 K 


is a bounded function of 
In what follows in this 


n where 6 ' is restricted to the interval \ e' - e \ S 
section, as well as in section 3, the domain of 8 wiU be 



ASYMPTOTIC PEOPERTIES 


43 


restricted to interior points of the interval A unless a statement to the con¬ 
trary is explicitly made. 

Clearly 


(2.3) 


B 


^ 3 log Vn 

< de 



'm 


dXi ■ •' dXn . 


It follows from Condition 1 that 


(2.4) 



Hence, 

(2 5) 

We have 


„ / a log p„ 

V ae 



= 0 , 


(2.G) 

Hence 

(2.7) 
But 

( 2 . 8 ) 



because of Condition 1. From (2.7) and (2.8) we obtain 


(2.9) 


„/ d^ log Pn 

\ 69^ 



-cM 


Conditions 3 and 4 will generally be fulfilled when the stochastic dependence 
of X, on r, decreases sufficiently fast with increasing value of | i — j |. For, in 
such case.s, the following order relations will generally hold: The standard dovia- 

0^ loc *0 “ 

tion of will, in general, be of the order V?i, the expected value of 

06^ 


l.u.b. 

If'-JIg! 


dd'^ “ 


Avill usually be of the order n, and 

n 

bound and a finite upper bound. 


will generally have a positive lower 



44 


ABRAHAM WABD 


3. Proof that the mflYimiitri likelihood equation has a root which is an asymp¬ 
totically efficient estimate of 6 (at least in the wide sense). Let 0o denote the true 
parameter value and let 6 be any other value. We put 

(3.1, 

Expanding f„(a:i, • • • , a:„, $) in a Taylor expansion around 0 = we obtain 

#»(a:i , • • * , , 0) = ^n{Xl , ■ • • , K, , 0o) + (0 “ doWniXi , • • ‘ , x„, 0o) 

(3.2) -f K0 - 0o) Vn'Czi d'„) 

where is some value between 0o and 6. Dividing both sides of (3.2) by Cn{0(,) 
we obtain 

4*n(3^1 1 ’ * * I Xfi J 6) ^ ^Ti(Xl , * * * y Xa , do) 

Cn(0o) 


(3.3) 


Cn(0o) 

+ (0 — 0o) 


5^(^1) ■ ■ ■ > 1 0o) J_ 1 /o O ^"(Xi , • • • , Xn, dt) 


C«(0o) 


+ K0 - 0o)= 


C„(0o) 


From Condition 3 and equation (2.9) it follows that 
(3.4) 


plim^^'-y;^"’^") = -1 

n-ao C»(0|j) 


where the operator plim stands for convergence in probability (stochastic con¬ 
vergence) . 

According to equation (2.5) the expected value of 4>„(a:i, • • • , a:„ , 0o) is zero 
Since the variance of $,(x, x„, do) is equal to c„(0a). and since 

lim c„(0) = 00 , we have 

(3 5) plim ?"fa>---,a:n,0o) ^ 

Cn(^o) 

It follows from Condition 4 that for any 0 with 1 0 - | ^ 5 we have 

= 0 ( 1 ). 

According to Markofi’s inequality the probability that a positive random 
variable will exceed X-times its expected value is not greater than Hence 
tt Mo™ from (3.6) that for .ny . > 0 we oon tad o positive value k. eucK tha( 
(3.7) Ita »p Prob 1 ^ I ^ ^ ^ 

itive n 

4’i.(zi) • • • , a:„ , 0) =0 


Let p be any given positive number The nmhnhilUvr i. 
likelihood equation probabihty that the maximum 

(3.8) 



ASYMPTOTIC PROPERTIES 


45 


will have a root in the interval (6o — p, 9o + p) converges to one as n —» «. 
This follows easily from (3.3), (3.4), (3.5) and (3.7). Thus, we have shown that 
the maximum likelihood equation has a root Bn which is a consistent estimate, 
i e. it satisfies the relation 


(3.9) phm (^n ~ ^o) = 0. • 

We shall now show that if is a root of the maximum likelihood equation 
(3.8) and if Bn is a consistent estimate, then Bn is also asymptotically efficient, 
at least in the wide sense. For this purpose we substitute for B in (3.3) and 
multiply both sides of the equation by -v/c„(fio) • We then obtain 


(3.10) 

where 

(3.11) 
Let 


^n(xi , • • • , x„ , Bq) n. $n(Xl, ■ ■■ ,Xn,Bo) 




1 (^1 7 ' " * ) 7 ^ n ) 

""2 • 


(3.12) y„ = and z„ = Vcn(0o) (»-. - ^o). 


V Cn{do) 


Then (3.10) given 


(3.13) 


__ ^ ,x„, Bo) 


~Vn — 2n — - /oT 

c«(®o) 

It follows from (3.7) and (3.9) that 
(3.14) plim {Bn — Bo) p„ = 0. 


+ 3n(flr. — Bt) Un. 


From (3.4), (3.13) and (3.14) we obtain 
(3.15) - J/« = z»(- 1 + fn) 

where 


(3.10) plim = 0. 

Since I'yn = 0 and I'Jyi = 1, it follows from (3.15) and (3.16) that 
(3.17) plim (z„ — yn) - 0. 

rtanog 

The asymptotic efficiency (in the wide sense) of is an immediate conse- 
quenee of (3.17). Our main result may be summarized in the following theorem: 
Theorem. If the true value of the 'parameter 6 is an interior point of an inter- 



46 


ABRAHAM WALD 


ml A satisfying the conditions 1—4, then the maxmium likelihood equation (1.9) 
has a root^ which is a consistent estimate of 6 Furthermore, any root of (1.91 
which IS a consistent estimate of B is also asympioiicallu effideni at least in the uneh 
sense. 

Since the maximum likelihood estimate is a root of (1.9), it fiillows from tlm 
above theorem that -whenever the maximum likelihood estimate is consistenl, 
it is also asymptotically efficient at least in the wide .sense. 

REFERENCES 

[1] H. Ciiam£b, Mathematical Methods of Statistics, Princeton Univ. Press, 1946. 

[2] C R. Rao, "Information and the accuracy attainable m the estimation of statistical 

parameters”, Bull. Calcutta Math, Soc., Vol. 37 (1946). 


' The probability that (1.9) has at least one root converges to unity as n -♦ <». 



DISTRIBUTION OF A ROOT OF A DETERMINANTAL EQUATION 

Bt D. N. Nanda 

Institute of Statistics, University of North Carolina 

1. Summary. S. N. Roy [2] obtained in 1943 the distribution of the maxi¬ 
mum, imnimum and any intermediate one of the orots of eertain determinantal 
equations based on covariance matrices of two samples on the null hypothesis 
of equal covariance matrices in the two populations. The present paper gives 
a different method of ivorking out the distribution of any of these roots under 
the same hypothesis The distribution of the largest, smallest and any inter¬ 
mediate root when the roots are speciRed by their position in a monotonio ar¬ 
rangement has been derived for p = 2, 3, 4, and 5 by the new method. The 
method is applicable for obtaining the distribution of the roots of an equation of 
any order, when the dislrihutu)n.s of the roots of lower order equations have been 
worked out 


2. Introduction. If .r = |1 x,, || and x* = || x*, || are two p-variaLe sample 
matrices with lu and ni degrees of freedom respectively, and S = xx'/iii and 
S*’ = x*x*'lni are the covariance matrices which under the null hypothesis are 
independent estimates of the same population covariance matrix, then the joint 
distribution of the roots of the determinantal equation |A— R)| = 0 

nliere A = niS and B = n^S* has been obtained by Hsu [1] in 1939, The cli.s- 
trilnition densty i,s 


m, M, r) = ^ 


r(i/2) 


tul ini ^<^7 


(0 g ^ 01-1 g • • • 01 ^ 1), 


whore I = min. (p, ?ii), m = 1P — 1 + 1, and p = ni ~~ p + L 

This fonnula also gives the joint distribution of the squares of canonical cor¬ 
relations on the null hvpothe.sls, that the (wo .sets of variates are indejiendent 

ID. If 



47 










48 


D. N. NANDA 


are the observations on the two sets of canonical variates and the x’s au* nor¬ 
mally distributed, independently of the w’a, then the equation for the eanoniciil 
roots is I — 6Vxx \ - 0 , wheie 6, = t\ and Vxu, ■== A'll’ etc., ... 

It is observed that is like A with n\ - q and 7** — is 

like B mth Ui = N — q - I and the above equation is reduced to the form 
1 A — 8{A + jB) I = 0. It is under this condition that R{1, m, v) gives the joint 
distribution density of n, rl, ■ ■■ , rl where I = min. (p, (z), a = 1 P “ tf 1 + 1, 
and V = N — p — q. 

3. Notation and preliminaries. 

(a). Let 

11(6. - Bi) = {1,2, 3, ••• ,lj. 

‘<7 

It is known that the value of the Vandermonde determinant 
1 1 1 1 

6i $2 B 3 • • • 6| 

el bI bI b] 

is equal to n (6. - 6,) = (-1)‘{1,2,3, • • • , Z}. 

Then 


1 1 1 
61 62 83 

el el el 


= (62 - e 0 {e, - ej)(e, - 0,) = _ 2,3}, 


but the determinant can also, by expansion in minors of the first row be 
pressed as « uo 


ex- 


where 


- 19 MI , 2} + 020,{2, 3} + 0,0i(3, 1)1 


61 - 02 = {1,2). 

Hence 

® 2. 3) = 0102(1, 2) + 0a0i{3, 1) + 020 j(2, 3). 

Similarly 

{1, 2, 3, 4) = 01020,(1, 2, 3) - 040i0j{4, 1, 2) 

+ 6 s 040 i {3, 4 , 1) - 0 j 0304{2, 3, 4 ), 


(3) 





BOOT or A DETEBMINANTAL EQUATION 


49 


and 

11,2,3,4,5} = M2M4{1, 2,3,4) +660162^315,1,2.3) + 64666i6j14, 5, 1, 2) 
(4) 

+ 6364666113, 4, 5, 1) + 62636466(2, 3, 4, 5). 

It is seen that in the successive terms the 6 ’s are present in a decreasing order, 

(b). Let 

(a, h; m, n) = y"(l - 1 /)" \l = b”‘{l - b)" - o^d - 0 )", 
and 

(a, 1 , b; m,n) = f - y)” dy; 

•'a 

then 


(4, a) 


(a, 1, b; m + 1, n) 


(a, b;m + 1, n + 1) , m + 1 
m + n + 2 m + 7i + 2 


(a, 1, b; m, n). 


by a combination of the transformations obtained by partial integration and by 
breaking up (1 - y)'*’''* into (1 - 1 /)" —y{l — y)”. 

(c) Let 


( 0 , 2,1, bj m, n) = ^ 
( 0 , 2, b, 1, c; m, n) = j 


<b 


0<92<i<h<e 


(6i6j)“(l - 6i)"(l - 6j)''{l, 2) d6i d6i 
(6i6,)”'(l - 6i)"(l - 62 )’’{ 1 , 2 ) d6i de ,, 


and 


( 0 , 3, b, 2, c, 1, d; m + 1, n) 

= f - 6 i)”(l - 62)"(1 - 63)’'{1, 2,3) d 6 i d63 ^ 63 . 

(d) Let 

Tr-^gCy) - f y”(l ~ vr ff(v) dy, 

then 

^^■"( 0 , y, k, 1) - (a, 1 , b; m + /c; n + 0 , d > 0 ) 

and 

r‘'’"-"(b, 1 , c; k, 1) = (a, I, bj m, n)(b, 1, c; k, 1). 

With these preliminaries we proceed to derive the distribution of the roots. 



50 


D. IT. NANDA 


4. Distribution of the largest root. Let us suppose tlial tlie roots are arranged 
in decreasing order such, that for I roots wo have 

0 < 0 ; < di-i < 0(-2 , • ■ , < di < 9i < 1. 

If the distribution density R(l, fi, v) given by (1) be c\])r(‘s.sed ns 

i t 

B(l, m, n) = G{1, m, n) IT 11 (i “ «.)" II “■ O;). 

i-l i<i 

then the distribution of the largest root in the general case would be given by 
Pr(0i ^x) = C{1, m, n.)(0, l,l — 1, • - • ,2, I, x; m, n). 


Now we shall derive the distribution of the largest root for I = 2, 3, 4, and 5. 
(a) I = 2. 

Pr{8t ^ x) = (7(2; m, n)(0, 2, 1, x; m, n). 

(0,2,l,x-,m,n') = f (0i02r(i - 0i)"(l ~ Pa)"!!, 2) iWyMi 

= [ «?(! - di)"0T(l - 21 dOicJdi 

= f 0 a(i - - dy)’' (ley do, 

- f e"(l - 02)"0r+‘(l ~ 0 i)" dOy do ,. 


The limits' In the successive integrals are to be so adjusted as to keep the inte¬ 
grand same. Then using the notation given in section 3(d) and equation (4, a). 

(6) (0, 2,1, a;; m, n) = l,x;m + 1, n) - T^'^O, I, y; ?« -(- 1, n) 

or 


(0, 2, 1, x-, 7 n, n) = Tr-" [- + l n + 1) _jn_+1 r ^ ^ . 

' m + n 4- 2 w -f n + 2 J ‘ 

Now by a change in the order of integration, 

rS'”’"[(0, 1, y; m, n) - {y, n)] = 0. 


Therefore 


im + n + 2)(0, 2, 1, a;; m, n) = 2'?’"’"[2(0, y; m -f 1, n + 1) 

- (0. y; m 4- 1, n, 4- i)] 

= 2(0,1, a:; 2m 4- 1, 2n 4- 1) 

— (0, x;m 1, n4- 1)(0, 1, X] m, n). 



ROOT OF A DBTKRMINANTAL EQUATION 


51 


Hence 
Pr{ei ^ a:) 


<7(2, m, ti) 
m + n + 2 


[2(0, l,a:;2m + 1, 2n + 1) - 


(0,a;;?n + 1, » + 1)(0, l,x\m, r)]' 


= <7(2, m, n) 


m + » + 2 Jo 


15 


2m+l 


(1 - 


- a:)"+^ 
m + n + 2 


/\"(i - vT 

Jo 



(b) I = 3. For this case we need certain results for Z = 2 which can be easily 
obtained and are given below: 


(6) (o, 2, 1, b; m, n) = - ^ (a, 1, b; 2m + 1, 2n + 1) 

m+n+2 ' 

~ ^ ^ ^ 2 + 1) + (0i h; m +1, n. +1)] X (a, 1, b; w, n) 

and 


(o, 2, b, 1, c; m, n) = “q;_ + 2 "t" 1( + 1)(6, 1, c; m, n) 

+ (0, b; m + 1, n + l)(a, 1 , c;m,n) - (0, c;m + 1, n + l)(a, 1 , b; m, n)]. 

Now 

(0, 3, 2,1, X] m, n) 

^ L. .. - <'»)"(! - 2, 3) dfii de, dd, 

= / (fiiff, fio)"'(i - <?i)’‘(i - e^m - o,)"[0ifl,{i, 2) 

+ 0i0i[?>, 1} + 0aft(2, 3)] dfli tZOj 

(using equation (2)) 

“ I » « . “ 0zn6,$iT{\ - 0,)"(1 - 2 ) dOi dOz 

Jo<»i<Sj<9i<a I > I 

+ / +/ 

or 

(0, 3, 2, 1, X; TO, u) = To’ ’ (j/j 2, 1, x; ?7i + 1, n) 

+ rf’"’"(0,1, y, 2, XJTO + l,n) 
±jr’"LQxJUaJ/im+ l,n), 

-A « « k k Elr 



52 


D. N. NANDA 


but the B’a are to be always arranged in the same order, hence 
(0, 3, 2,1, X] m, n) = 2, 1, m + 1, n) 

- 2, y, 1, x; m + 1, n) 

+ 5Pr-"(0,2, l,2/;7n+ l.n). 

Using equations (6) and (7), we have 
(0, 3, 2, 1, x\m, n) 

M +V+ 3 {^^2/i^i^i2>w + 3,2n. + l) - (y, l,x;m + I, n) 

X [(0, y; m + 2, n + 1) + (0, X, m + 2, n + 1)] 

- (0,1, x;ot + l,n)(0,2/;m + 2,n +1) + (0, 1 , 2 /; m +1,7()(0, x;m + 2 , n + 1) 
+ 2(0, 1,2/;2m + 3, 2n 4- 1) - (0,1, y; m + 1, n)(0, y;m-h2,n+ 1) | 

fi 

^ m + w + 3 {^(2/,l,a;;2m + 3,271 + 1) + (0, 1, y;2m + 3, 2n + 1) 

- (0,2/,m + 2, n + 1)[(0, l,x;m + 1, n) + (0, 1, y; m + I, n) 

+ (y, 1, x; m + 1, n)\ - (0, x;m + 2, n + 1) 

[(y, 1, x; m + 1, n) - (0,1, y; m + I.ti)]} 

“ m + n + 3 + 1) 

~ 2(0, y; wi + 2, n + 1) (0, 1, x; w + 1 , ti) 

- (0, x; m + 2, n + l)[(y, 1, x; m + 1, n) - (0,1, y; m + 1, n)]). 

Using equation (5), we have 


( 0 , 3 , n, n) . | 2 ( 0 , 1 . 2 „ + 3 , 2 » + 1 )( 0 , 

- 2(0, 1 , i; 2 m + 2, a, + 1)(0,1, ,, m + 1, „) 


Pr{9i ^ X) = „ 

(m + w + 3) + 3, 2ri + 1)(0, + x;m, ?{) 

w - 2(0, 1, x; 2m + 2, 2« + i)(o, 1, x; m + 1, n) 

~ ^®'='5”^ + 2,n + l)(0.2,l,x;, 

-4. to determine (0,4, 3, 2, l,x;m,n) we need the values of 



HOOT OF A BETERMINANTAIj EQUATION 


53 


(a, 3, 2, 1, b, m, n), (a, 3j h, 2, 1, c; m, n) and (a, 3, 2, b, 1, c; m, n), which ai‘e 
obtained according to the procedure given above, 

Now 


(0, 4,3,2,1, a:; w, n) «= [ 
Jo 


era - 


(1 - 5,)"(1 - 0,)”(1 - (?,)"{1,2, 3, 4j dSidff, de, ddi 


Jot 


04 (1 - 04)’'(0ieo03y 


^O<$4<ai^02<6i <x 

• a - dim - ~ 03n04 0203{i, 2, 3 j 

- e40i02{4, 1, 2} + 03040113, 4, 1} - 0id304{2, 3, 4)] dBidd^des dB^ 


-L 


ci<«j<9i<«i<ei<* 


<>r(l - B4T[(010203) 


m+l 


(1 - - fianu, 2,3) 


-i. 


+/ 


-I 


- 3, 2. 1, a:; m + 1, n) - rj-”‘”‘(0, 1, y, 3, 2, b; w + 1, n) 

+ ?f ”""(0, 2,1,2/, 3, ®i m + 1, «) - ^■•"■"(0, 3, 2,1, y; m + 1, n) 

» 3, 2, 1. *; m + 1, n) - rr^CO, 3, y, 2,1, b; m + 1, n) 

+ rS'"""(0, 3, 2, y, l,x-,m + 1,«) - n-'"'"(0, 3, 2,1, y; m + 1, n). 

Using the results of (a, 3, 2,1, b; m, n), (a, 3, b, 2,1, c; m, n) and (a, 3, 2, b, 1, 

c; 7n, n), we have Pr(0i ^ x) equal to 

C(4,«h«)(0, 4,3.2, l.a;;w,n) 

m + 71 + 4 


(9) 


2(0, 1, x; 2771 4- 5, 271 + 1)(0, 2, 1, x; tti, n) 
2(0, 1, x; 2771 + 4, 271 + 1) 


[2(0, 1, x; 2777 + 2, 271 + 1) 


(tti + 71 + 3) 

(0, x; 777 4- 2,71 + 1)(0,1, x; m, n) + (m + 2)(0, 2,1, x; m, ti)] 

+ 2(0, 1, x; 2771 + 3, 2n + 1)(0, 2,1, x; tm + 1, n) 

— (0, x; 777 + 3,71 + 1)(0,3, 2,1, x; m, 77)1. 

(d) Z *= 6, In the evaluation of the distribution of the largest root for Z = 5; 
the following parts need to be calculated: 

(a, 4, 3, 2,1, bi 777, 77 ), (a, 4, b, 3, 3,1, cj 777 , 77 ), (a, 4, 3, b, 2,1, c} 777 , 77 ), 

( 0 , 4, 3, 2, b, 1, c; m, n). 



64 


D. N. NANDA 


Proceeding along the lines indicated in the previous sections we get 

Pr(fi ^ x) =» r 2(0,1, x; 2in + 7,2n 4-1) (0,3,2,1, «i, n) 

^ ' (m 4- n + 5) L 

_ 2(0,1, x; 2m 4- 6, 2n 4- 1) | 2 (o, 1 , a; 2m 4- 4, 2n 4“ 1){0,1> *5 m.«) 
(m 4- w 4" 4) 

- 2(0,1, a:; 2m 4- 3, 2n 4-1)(0,1. x; m 4- 1, »0 

- (0, x; m 4- 3, n 4- 1)(0, 2,1, xj m, n) 

. 2(0,1, x;2m 4- 6, 2u 4- D 
4- (m 4- 3)(0,3,2,1, x; m, n)} 4-- (mn 4-'4j 


( 10 ) 


.|2(0,1, x; 2m 4- 5, 2» 4-1)(0,1, m, n) 

- 2(0,1, x; 2m 4- 3, 2n 4-1)(0,1, x; m 4- 2, n) 

_ (Q,x;m43,n4 -J [ 2 (o, i, a;; 2m 4- 2, 2n 4- D 
(m + n 4" 3) 


(0,x!w4-2,n4-l)(0, «) 




H" -|“ 2)(0, 2,1, ici PZj ?j)lj 

, - 2(0,3,2,1, x; m 4-1, n)(0, 1, x; 2m 4- 4, 2n 4- 1) 

— (0, x; m 4 4, n 4" l)(®i 4,3,2,1, x; m, n.)J. 

It is evident now that the above method can be used to derive the distribution 
for any value of I, 


S. Distribution of the smallest root. Let Prlffi g x/m, r] = P(.x/n, p) where 
di is the largest root. Let us make the following transformations in the HU, p, y) 
distribution: 

ri = 1 — 5; 
rj = 1 — 0M 


rt = 1 — 01 ; 

then since 0 < 0( < 0i_i < • • • < 9i < 1, we have 0 < ri < ri_i < rc^ •• • < 



ROOT OF A DETKRMINANTAR EQUATION 


55 


n < 1, and thus the domain of integration does not change. Hence the joint 
distribution of the r's can be expressed as 

Cft /.) n W'"”" rid - 0 <„<...< r, < 1. 






«/ 


Thus the r’g have the same distribution as the $’a, but n and v are inter¬ 
changed. Therefore 

PriSi < x) Pr(l - n g i) = 1 - Pr(ri ^ 1 - a;) 

= 1 - P(1 — x/v, n). 

Hence, for getting the distribution of tlie smallest root, we have to change x 
into 1 — a: and interchange ?n, n in the distributions of the largest roots and sub¬ 
tract the resultant probability from 1. I'lie distributions for the smallest root 
arc given below for Z = 2,3,4 and 5. 

(i) I = 2. 

Pri92 < x) = 1 — Pr{di g 1 — x/n, ot) 

= 1 - - 2" + + 1) 

m + )i +2 ' 


(ii) 1 = 3, 

( 12 ) 


- (0, J - x, n -f 1,711 -f- 1)(0,1,1 - a:, n, in)). 


PrC^a g a’) = 1 - {2(0,1,1 - a:; 2n + 3, 27n + 1) 

7R -h 71 T O 


■ (0,1,1 - a:; n, m) 


~ 2(0, 1, 1 — a;; 71 -I- 1,7?i)(0,1, 1 — a:, 2a -{- 2, 2 ?m -j- 1) 


- (0, 1 - x; n + 2, m -f 1)(0, 2,1, 1 - x] n, m)}. 

(iii) I = 4. 

PriOi g .T) = 1 - { 2 ( 0 ,1, TTT^, 271 -f 5, 2771 + 1) 

771 H- 71 -f 4 ( 


(0, 2,1,1 — a:;7i, m) 


(13) 


^’1 - g. 271 + 4, 2771 + 1) 
(771 + 71 + 3) 


(0, 1 ■“ a;; 71 + 2, m -j- 1)(0, 1,1 — x\n, m) 


[2(0,1, 1 - ai; 271 + 2, 27n + 1} 


+ (71 2)(0, 2, 1, 1 - is; 71,77i)] 


2(0, 1, 1 - a;; 2n -b 3, 27ii -{- 1)(0, 2, 1,1 - ®; 7i -{- 1, m) 


— (0, 1 — ®, 71 -b 3, m 4- 1)(0, 3, 2,1, l — a:; 7i, m) 7 . 



56 


D. N. NANBA 


(iv) I = 5. 


Pr{^s ^ x) = 1 — 


0(5, n, in) 
(m + n + 5) 


[2(0,1,1 ~ s; 


2n + 7,2;n + 1) 


•(0, 3, 2, 1, 1 — *; n, m) 

- 2(0. l. _ l (2(0, 1, r-=^; 2)1 + 4, 2i)i + 1) 

{m -\-n -y Vj 

•(0,1, {71. x;n, m) 

- 2(0,1, i P^x-, 2n + 3, 2ot + 1)(0, 1, 11 + m) 

~ (0,1 — k; n + 3, m + 1)(0, 2, 1, f — x\n, 7n} 
+ (?i + 3)(0,3, 2, 1,1 - n, m )) 

2(0,1, 1 - a!;2n + 5, 2ot + 1)/ ^ , r r> , 

+- (;^ + F+4) -r®' 1 

(0, 1, i — a:; n, m) 

-[2(0, 1,1 - a:; 2n + 3, 2m + 1)(0, 1, 1 _ 3 ;; n + 2, m) 

- (0,1 — a; n + 2, m + 1)(0,1, 1 — x; n, m) 
+ (n + 2)(0, 2,1,1 - x\n, in)^ - 2(0, 3, 2, 1, I - ji; n + I, ml 

•(0,1, r=^;2))+4. 2)«-f-l) 

- (0 ,1 - a:; n + 4, m 4-1)(0, 4, 3, 2,1, 1- a;; n, m) j. 


6 , Distribution of any intermediate root, 

(i) I = 3. 

Pridi g>) = Pr{0 < 03 < < a;) 4- Pr(0 < 0, < < x < 00 

= 0(3, m, n)[(0, 3, 2, 1, x; m, n) + (0, 3, 2, x, 1; m, n)] 
as the^wojprobabilities are independent, or 

Pr(02<x) = 0(3, m,n)[(0, 3,2,1, x;m,n) 4-(0,3,2,x,l,z;m,?i)], where s = 1 

0(3, m, n) , 

“ m~{-n 4-3 4* 1)(0, 1 , x; m, n) 

- 2(0,1, x; m + 1, n)(0,1, x; 2m 4 - 2, 2n 4- 1) 

- (0, x; m 4- 2, n + 1)(0, 2, 1, x; m, n) 

(16) 4-)(x, 1, 2, m, n)[2(0,1, x; 2m + 3, 2n 4- 1) 

- (0, x; m + 2, n + 1)(0, 1, x; m + 1, n)] 

_ (a:,2;m + 2,n4-l) . 

m 4 -n 4-2 (^(0,1, x;2w 4-1, 2 )i 4 - 1 ) 


(0, x;m 4-1, n 4 -1)(0,1, x;m, n)l 

+ (x, 1, 2 ; m + 1, n)(0,1, x; m, ?i)(0, x; m 4- 2, n 4- 1) 

— 2 (x, 1 , 2; m 4 - 1 , n)( 0 , 1 , x; 2 m 4 - 2 , 2 n 4 - 1 )\, 



HOOT OP A DETEBMlVANTAL EQUATION 


57 


(ii) I = 4, 

Mk Sx) ^ Pr(0 < Qi < 8i < Si < Si <x] m, n) 

■h Pr (0 < Si < S) < Si < X < Si; ni, n) 

* C'(4, m, n)[(0,4, 3, 2,1, x; m, n) + (0,4,3, 2, x, 1; m, n)] 
and 

/V (05 g x) Pr(0 <Si<S3<Si<Si<x’, m, n) 

+ Pr(0 < Si < Si < Si < X < Si] m,n} 

+ Pr(0 < Si < Si < X < Si < Si] m, n) 

= 6'(4, m, n)((0,4,3,2,1, x; w, n) -|- (0,4,3,2, x, 1; m, n) 

+ (0, 4, 3, X, 2, 

The dt£forent parts of these probabilities can bo evaluated as indicated in sec¬ 
tion 4(d}. Thus the method already indicated to obtain the distribution of the 
largest root also gives the distribution of any one of the roots. 

7. Further problems. It is intended to prepare the probability distribution 
tables for small values of L The results obtained in this paper are found to be 
usefid in finding the di.stribution of the sum of the roots when the numbers of 
canonical variates in two sets dilfer by one. This problem is, however, being 
investigated further. 

AchnowMgmmis. The author is highly indebted to Dr. P. L. Hsu for sug¬ 
gesting the problem and for guiding in this research, and is also thankful to Dr, 
Harold Hotelling for his suggestions and help in this work. 

EEFERENCES 

b] P. L, IIsTi, "On the distribution of roots of certain determinantal equations'*, Annuls of 
Eugenics, Vol. 3 (1939), pp. 250-258. 

[2 ] S. N. Roy, "The individual sampling distribution of the maximum, the minimum and 
any intermediate of the ‘p'-statistics on the null hypothesis", Smkhy&, Decem¬ 
ber, 1943. 



A ft-SAMPLE SLIPPAGE TEST FOR AN EXTREME POPULATION 

By Frederick Hosteller 
Harvard Univmily 

1 . Summary. A test is proposed for deciding wliethfr one of k populatinns 
has slipped to the right of the rest, under the null hypotlichiH tlmt all iKipulaiiniiH 
are continuous and identical. The procedure is to pick the munplo with the larg¬ 
est observation, and to count the number of oliservaiions r in it which exceed ;ill 
observations of all other samples. If all .samples arc of the .same size n Iiirgc, 
the probability of getting r or more such obscrvation.s, when liu- null hyiKJtlic'is 
IS true, IS about It"' 

Some remarks are made about kinds of emiiN in testing bypntbws. 


2. Introduction. The purpose of this paper i.s tn describe a gignificana! fi’ht 
connected with a statistical question called liy the presi'nl luitluw “the prnlilcni 
of the greatest one.” Suppose there arc several eoiUinuous ]i()piilatiims/(/ - 
/(s; — ad, ■ j J{x ~ ai), which are idoutical e.vcept for rigid Irtui-datiutis or 
slippages. Suppose further that the form of the populations and tlie vahu*’’ of 
the tti are unknown. Then on the basis of samjilcs from the k popiilatioriH w<* 
may wish to test the hypothesis that some population Iuls slipped further (o the 
right, say, than any other. In other words, wc may ii.sk whether there exisln nn 


Ui > max (fli, aa, • • ■ , a,_i, , • • • , on). From llie point of view of fexiiiig 

hypotheses, the existence of such an o, is taken to be the allernative liyjiotiif'sis, 
A significance test will depend also on the null hypotliesis. W'a shall take iis ()»('• 

null hypothesis the assumption that all the aV are eipud: ai = oa .. a.. 

Using these assumptions it is possible to obtain pununeler-free .significiuiro 
tests tha,t some population has a larger location parameter (mean, median, tiuiui" 
tile, say) than any of the other populations. 

The problem of the greatest one is of considerable practical irnrHirlmuM- 
Among several processes, techniques, or therapies of approximately eepud cuhI, 
we often wish to pick out the best one as measured by sijnie eharactcriHiic. 
Farthermore, we often wish to make a test of tlie signifieanee of o,m of )hc 
methods against the others after noticing that on the basis of the sample values, 
a particular me hod seems to be best. The test provided in this paper allows 

^ ? f ^ ^ being rapid and easy to apply. Ihnv- 

vei, the test is probably not very powerful, and in the form presented here (he 

no vorv ZZ? ' 1 If'■'« tochnique, but «n«. 

«, ,s'knoZ tl rt.. "SmloMoo Wcls te the Uuequal-Mmplo 


58 



A k-mutw mppAOK tkst 


59 


3. The test. Supp«» wc have k samples of size a each. It is desired to 
tcht the alternative hypnthms that one of the populations, from which the 
samples were drawn, hw l>een rigidb' translated to the right relative to the re¬ 
maining populations. The null hypothesis is that all the populations have the 
rame location parameter. 

1 ’he test eonsto in arranging the olisciAfations in all the samples from greatest 
to least, and observing for the sample with the largest observation, the number 
of observations r which exceed all the observations in the k — 1 other samples. 
If r > ra we accept the hypotliesis that the population -whose sample contains the 
largest obfservation has slipped to the right of the rest and reject the null hypoth- 
esi.'i that all the populations are identical; instead -n'c accept the hypothesis 
that the sample with the largest observation came from the population -with the 
rightmost location parameter. If r < ro, we accept the null hypothesis. 

The statements just made are not quite usual for accepting and rejecting 
hypotheses. Cla^ctdly one would merely accept or reject the hypothesis that 
the ai are all equal. The statements just made seem preferable for the present 
purpose. 

Example, The following data arranged from least to greatest indicate the 
difference in log reaction times of an individual and a control group to three 
tjqies of words on a word-association test. The differences in log reaction 
times have been multiplied by 100 for convenience. Longer reaction times for 
tlie individual ore positive, shorter ones are negative. Does one type of word 
require a shorter reaction time for the individual relative to the control group 
than any other? 


Concrete 

Abstract 

Emotional 

-G 

-16 

-6 

-6 

-11 

-5 

-5 

-3 

-3 

-5 

-2 

-2 

-4 

-2 

-1 

-3 

-1 

0 

-1 

-1 

1 

0 

1 

3 

0 

1 

5 

3 

1 

12 

9 

8 

13 

11 

10 

13 

12 

16 

15 

29 

20 

28 


Here we have /c = 3 samples of size n = 14 each! We note that the Abstract 
column, has the most negative deviation, —16, and that there are two observa¬ 
tions in that column which are less than all the observations in the other col¬ 
umns. Consequently r = 2. Under the null hypothesis the probability of ob- 



60 


feedehick mostbllbr 


taining 2 or more observations m one column less than all the observatinn‘( in 
the others is about ,33, so the null hypothesis is not rejected. 


4. Derivation of test. Suppose we have k samples of size n, all from 
the same continuous distribution function /(x). Arranging observations within 
samples m order of magnitude the samples Of are: 0i: Xu , Xis, • • • , Xj, ; 0% : 
X21 , X 22 j ' ■ , X2n J ' ' ■ j Ob • j ' ■ * j ^kn ■ 

If we consider some one sample 0(, separately, we can inquire about the 
probability that exactly r of its observations are greater than the greateat ob¬ 
servation in the other k — 1 samples. 

The total number of arrangements of the /in obseiTations i.s 

( 1 ) T = 

^ ^ (nl)* • 

The number of ways of getting all n observations of Oi to be greater than all 
observations in the remaining samples is 


( 2 ) 


N{n) = 


[{k - l)nl ! 
(nl)’=~‘0! 


The number of ways of getting exactly n - 1 observations of Of greater than 
all observations in the remaining samples is 

(3) N(n - 1 ) = l(fe ~ -b 11 ! __ [{k - l)n]l 

More generally, the number of ways of getting exactly r = n - n of Oi to be 
greater than all other observations in the remaining samples is 


(4) N{n - «) = ~ _ i(fc - l)n + u - l]i 

{n\y-^u\ (n!)'^i(u - l)l 


Therefore the number of ways of getting a run of r 
in 0, greater than the rest is just 


n — It or more observations 


(5) 


8{n 


«) = L 

tain—u 


Nit) = 

(nO^-iul ' 


However we do not choose our sample 0. at random or proossign it as the 
demonstration has thus far supposed. Instead we choose that 0^ ich has 

Jtb ?! I u ^ ^ ^ the probability that the sample 


(6) 


P(r) = _ ^(^0 (kn — r) ! 

T {kn )! ~{n — r)! 



A SUPI-AGE TEaf 


61 


As an indilcnfal fhwk we note in passing (liai 


P(l) « ' s_ ^ 

(/:n)! (» — 1)1 kn 

We i»»le tliat Kinatinn sTi) may be rewritten fia 

(7; P(r) - kCtr/Ci", 


1 . 


which is a useful form for some computations. 

'i’able I gives the probability of observing r or more observations in tlie 
sfunple with the largest ol>scrvation, (unong k sample.s of size n, which are more 
extreme in a preassignecl direction than any of the observations in the remaining 
k “ 1 samples. 


6, Approximations. If we use Stirling’s formula and approximations for 
(1 + a)', for small values of a and r, we can write an approximation for equation 
(6) for largo values of n mth r and k fixed as follows 


( 8 ) 


PM... 1 ('i-tgL r . nt^-n y 

^ ' k'-^\ 2kn ) 


For very large n eciuation (8) yields 


(9) 


P{r) 




j_ 

A:'-!' 


which is the value given in I’Etile I forn == w For many purposes the result 
given by equation (9) is quite adequate, as a glance at Table I will indicate. 


6. Kinds of errors. In tests such as the one being considered here the classical 
two kinds of errors arc not quite adequate to describe the situation. 

As usual we may make the errors of 

I) rejecting the null hypothesis when it is true, 

II) accepting the null hypothesis when it is false. 

But there is a third kind of error which is of interest because the present test of 
significance is tied up closely with the idea of making a correct decision about 
which distribution function has slipped furthest to the right. We may make 
the error of 

III) correctly rejecting the null hypothesis for the wrong reason. 

In other words it is possible for the null hypothesis to he false. It is also pos¬ 
sible to reject the null hypothesis because some sample 0< has too many ob¬ 
servations which ai’c greater than all observations in the other samples. But 
the population from which some other sample say Oy is drawn is in fact the right¬ 
most population In this case we have committed an error of the third kind. 

When we come to the power of the test under consideration we shall compute 
the probability that we reject the null hypothesis because the rightmost popula¬ 
tion yields a sample with too many large observations. Thus by the power of 



62 


m 


\ r 



\ 

2 

3 

3 

.400 

.100 

5 

.444 

.167 

7 

.462 

.192 

10 

.474 

.211 

16 

.483 

.224 

20 

.487 

.231 

25 

.490 

.236 

CO 

.500 

.260 


3 

.250 

.036 

1 

: 

! 

6 

.286 

’.066 

Loil 

.001 

^ i 

7 

.300 

i,079 

.018 

1.003 

|. 00041 

10 

.310 

j.089 

.023 

uOflS 

nMHI 

16 

.318 

i.096 

!.027 

’.007 

;.ooi8i 

20 

.322 

;.ioo 

.a3a 

|.OO0 

1.0023, 

26 

.324 

.102 

.031 

.009 

.0026! 

00 

.333 

.111 

i.037 

Ul2 

'.0041 


\ r 
\ 

n \ 


3 .143 .011 i 

6 .157 . 022 . 0020.00011 

7 .177 . 027 , 0033,0003 

10 .184 |.031 .0040.0000 
IS .189 . 084 . 0050',0008 
20 .192 .035 .0062^0010 

25 .194 , 036 . 0066 . 0011 
“ .200 .040 .oosoj.ooie 





























A A;-sample slippage test 


63 


this test we shall mean the probability of both correct rejection and correct 
choice of rightmost population, when it exists. 

Errora of the tliird kind happen in conventional tests of differences of means, 
hut they are usually not considered although their existence is probably recog¬ 
nized. It seems to the author that there may be several reasons for this among 
which are 1) a preoccupation on the part of mathematical statisticians with the 
formal Questions of acceptance and rejection of null hypotheses without adc- 
cpiate consideration of the implications of the error of the third kind for the 
practical experimenter, 2) the rarit}' with which an error of the third kind arises 
in the usual tests of significance. • 

In passing we note further that it is possible in the present problem for both 
(he null hypothesis and the alternative hypothesis to be false when k > 2. This 
may happen when there are, say, two identical rightmost populations, and the 
remaining populations are shifted to the left. Axi examination of Table I will 
give us ail idea of what will happen in such a case. If fc = 4, wc use r = 3 as 
about the .0.5 level. If two of the populations are slipped very far to the left, 
while the rightmost two populations are identical, in effect k ~ 2. In this case 
the probaliility of rejecting the null hypothesis is around .2. Consequently we 
accept the null liypotIie.si,s about 80 per cent of the time, and reject it 20 per cent 
of tlic time under these conditions. Rut neither hypothesis was true. 

If wc carry the discussion to it.s ultimate conclusion we would need a fourth 
kind of error for these troublesome situations. There are stiU other kinds of 
errors which will not be considered here. 

7. The power of the test. It is clifiBcult to discuss the power of a non-para- 
metric test, but in tlie pi-cscnt case it may be worthwhile to give an example or 
two. The reader will understand that although the test is called non-parametnc, 
its power does depend on the distribution function. 

In the case of k samples there are two extremes which might be considered for 
any particular fonn of distribution function. In Case 1, we suppose that 
when the alternative hypothesis is true, fc — 1 of the populations are identical 
■ftuth distribution fimotion /(x), while the remaining distribution function is 
J{x — a), a > 0. Case 1 may be regarded as a lower hound to the power of the 
test because for any fixed distance a between the location parameters of the 
rightmost population and tlie next rightmost population, Case 1 gives the least 
chance of detecting the falsity of the null hypothesis, 

In Case 2, wo suppose that the rightmost population is /(x — a), a > 0 os 
before, that the next rightmost population is fix), and that the other k — 2 
populations have slipped so far to the left that they make no contribution to 
problem of the power. This is an optimistic approach to the power because it 
gives an tipper hound to the power. Wlien fc = 2, Case 1 and Case 2 are identical, 
and the power is exactly tlic power of the test for the particular distribution func¬ 
tion under consideration. 

Case 3 which we shall not consider deals with the situation where there is more 



64 


FREDERICK MOSTELDER 


than one rightmost population, but the null hypothesis is false. It is connected 
with the fourth kind of error mentioned at the end of section 6. 

Table II gives the upper and lower bound of the power of Ihe test for fc = 3, 
r = 3, n = 3, when the distribution is uniform and of length unity. 1 he patainu” 
ter a is the distance between the location parameter of the rightmost distribu¬ 
tion and that of the next rightmost distribution. 

In Table III we give some points on the upper and lower bounds of tho power 
of the test for the normal distribution with unit .standard doviation. I he param¬ 
eter a is the distance between the mean of the rightmost normal distribution and 
the next rightmost, measured in standard deviations Again we luse the ca.so 
fc = 3 j »■ = 3, n = 3. 


TABLE II 


Power p of the test for the uniform distribution when k = S,r — 3, n — S. The 
distance between the midpoints of the two rightmost distributions is a 



0 

.1 

.3 

.6 

.7 

1 

.'J 

„! 

1 

1.00 

Upper bound pu 

.05 

.09 

.23 

.46 

.73 

.96 ! 

1 00 

Lower bound pi 

.01 

; .03 

.11 

,29 

.59 

.93 

1.00 

1 _ 


TABLE III 


Power p of the test for the unit normal when k = 3, r — 3, n = 3, The distance 
between the means of the two rightmost disinbulions, measured in standard 

deviations, is a 



0 

6 


■ 

2.0 

2,6 

3.0 

Upper bound pn 

.05 

.13 

.26 

.42 

.58 

.71 

.87 

Lower bound pi 

.01 

.04 

.14 

.27 

.43 

.60 

.80 


The power of the test has been defined as the probability of correctly rejecting 
the null hypothesis and finding the sample from the rightmost population to be 
the extreme one. This raises a question about the meaning of the entries in 
Tables II and III under a = 0. When a = 0 there is no way to reject the null 
hypothesis correctly. The probabilities given are tho probabilities that a 
randomly chosen sample will force a rejection of the null hypothesis, They 
represent the hmit of the power function as a tends to zero. If we think of ear¬ 
marking the sample from the lightmost population and of computing the prob¬ 
ability repeatedly that that sample will have three observations larger than all the 
observations m the other sample, and then we let a tend to zero, this is the result 
we get. These values are not the significance levels The significance level is 
















A /c-SAMPLE SLIPPAGE TEST 


G5 


8 . Discussion. The reader may rightly feel that the solution here presented 

to the problem of the greatest one depends on a trick. That is, it depends 
intimately on the choice of the null hypothesis. Furthcrinoic the reader may 
feel that the choice of ai = 02 = • = ai is neither an interesting null hypoth¬ 

esis nor one which is likely to arise in a practical situation. The author has 
no quarrel with this attitude. This means that there are many other approaches 
to this problem which are worth trying. The equal-location-paramcter ca.se is 
one which yields easily to non-paramctric methods. 

It will be noted that a useful technique has been indicated which allows one 
to examine the data before making the significance test. In general one may 
wish to set up a test function, decide which of several samples provides the ex¬ 
treme value of the function, and then test significance given that we have chosen 
that sample which maximizes the function among the k samples under con¬ 
sideration 

9. Conclusion. There is a large class of problems grouped around “the prob¬ 
lem of the greatest one”. Fir.st it would be useful to have a more powerful test 
than the one here proposed Second, there is the problem of deciding on the 
basis of samples whether we have successfully predicted the order of the location 
parameters of several populations Third, there is the general problem of what 
allornatives, what null hypotheses, and what test functions to use in treating 
samples from more than two populations. It is to be hoped that more material 
on these problems will appear, because answers to these questions arc urgently 
needed in practical problems. 



ON THE UNIQUENESS OF SIMILAR REGIONS 

By Padii G. Hoel 
University of California at Los Angeles 

1 . Summary. Conditions are determined for insuring that Noymau’s inethcKl 
of constructing similar regions by means of sufficient statistics will yield all such 
regions when such statistics exist. 


2. Introduction, In designing teats of composite hypotheses, one encounter 
the problem of how to construct similar regions and whether the construction 
process yields all possible similar regions. Neyman has derived methods for ob¬ 
taining similar regions when the basic distribution function satisfies certain pat' 
tial differential equations [ 1 ] and also when a sufficient set of statistics exists for 
the unknown parameters [2], In the former case, the construction process gave 
all such regions; however the question of whether certain subregions were inde¬ 
pendent of the parameters was left unanswered. In the latter case, the indepen¬ 
dence was obvious, but the question of uniqueness was not considered. In 
obtaining sufficient conditions for the existence of a type B region, Soheff 6 [ 3 ] 
employed Neyman's differential equations assumptions and methods and demon¬ 
strated that the subregions were independent of the parameters. 

The method of constructing similar regions by means of sufficient statistics is 
much simpler to demonstrate than is the method based on differential equations. 
It also has the advantage that the independence of the subregions requires no 
proof. It possesses the disadvantage that the question of uniqueness is not 
answered. This question can be answered by showing that the assumption of a 
sufficient set of statistics includes the differential equations assumption and then 
employmg methods based on the latter assumption. Such a procedure would 
deprive the sufficiency method of its simplicity, consequently a relatively simple 
direct proof of uniqueness has been constructed. The method of proof also shows 
the equivalence of the two methods of constructing similar regions. 


3. Sufficient conditions for uniqueness. Consider a distribution function, 
■ , dv), of the variable x that depends upon the v parameters 0 i, • • • 

6,. Let ••• ,Xn denote a random sample from this distribution and let 

m 1 ■ ■ • , ,d,) denote the distribution function of such a sample. It 

will be assumed that n > v. 


Suppose there exists a sufficient set of statistics 
T,{xi , . • , XJ with respect to the parameters . 
shown that if the T’s are continuous and if /(x| 0 i, • • 
f{x\di , ■ • • , e,) must be a function of the form 


Tx{Xx f • • ' , ‘ ' j 

6,. Koopman [4] has 
■ , 9,) is analytic, then 



■UNIQUENESS OF SIMILAR REGIONS 


67 


where the 04 and 0 are single-valued analytic functions of the 0’s only, and the 
Xk and X are single-valued analytic functions of x only. He has also shown that 
if n assumes its smallest possible value, then 

(2) Sx-tfe) = 

where the are single-valued functions of the T's. If the preceding conditions 

are satisfied, it follows from (1) and (2) that 

r p rt 

(3) ]{xi , • • • , I 01 , • • •, 0,) = exp 23 04 Ft n0 -1- X) X{x^ 

L 1 i-i 

Now it is known [2] that if the 2"s possess continuous partial derivatives and 
are such that it is possible to introduce additional functions IT^i, • ■ ■ ,Tn which 
will make the transformation 

T-l = Txixi , • • • , a!„) 


(4) 


T„ = Tnixi , • • • , »„) 

one-to-one, then/(a:i, • • • , a:„|0i, • • • ,0,) can be wi’itten in the form 

.. /(®i 1 • • •, a:n!0i , • • • , 0v) 

\0/ 

^ fiiTi , • • • ) 7'iil0i) * ■ ’ , ‘ 

where /i is the distribution function of thll T’s and fi is the conditional distribution 
function of the x’b for fixed values of the T’s, The function f% does not depend 
upon any of the parameters 0i, > • * , 0,. 

For the purpose of constructing similar regions, it is desirable to work with/i. 
By combining (3) and (5), fi may be expressed in the form 

X3 ©k+ u0 flj, 

where H = ^X{x{) — log /a can be expressed as a function of Ti, ■ • ■ , T, only, 
and where it is assumed that /a > 0. 

The method employed by Neyman to obtain a similar region of size a is to 
build it up as the locus of subregions of size a on the “surfaces” obtained by giving 
the T’s constant values. Since the size of such a subregion is obtained by inte¬ 
grating fi over the subregion, it will depend only upon the T’s ; consequently a 
subregion can be selected that will be of size a for every set of values of the T’s. 

Now consider the construction of a similar region of size a by building up the 
region as the locus of subregions of varying size rather than of constant size on 
the surfaces that are obtained by giving the T’s constant values. Let Wi and W 2 
be two regions of size a and let ai(Tt , • • ■ , T,) and ai(Ti , • • ■ , T,) denote the 


( 6 ) 


MTx 


, T, 1 01 , ■ • •, 0,) = exp 



68 


PAUI- G.'hOEL 


sizes of the surface subregions. It will be assumed that the regions under con¬ 
sideration are such that a\ and are obtainable from integrating /j over the sub- 
region common to wi and respectively and the surface determined by fixing 
the values of the T’a. The problem then is to determine whether two different 
functions, m and aj, can yield similar regions of size a . 

Since a critical region can be obtained as the locus of subregions, ai and a* will 
yield similar regions of size a only if 


(7) J J 

U - 1 , 2 ), 

Avhere the integration extends over the range of values of the T’a. By means of 
(6), condition (7) may be written as 

(8) / • ■ ■ / exp 1^^ 0k Fk + n0 d- If (ITi ■ • • (IT. = a O' » 1,2). 
If e"* is factored out, it is clear that condition (8) Avill hold only if 


( 9 ) 


/•••/ 


m exp 




L 1 


(ITi 


dT, 


= I ••• J a..exp 1: 0kFk + Zl] dT, - •. dr, 

is an identity in the 6’a, and hence in the 9* for the region in the 0* space that 
corresponds to the region in the parameter space for which the parameters 6,, 
■ ’ • , 6, are defined. 

Now assume that ti = v and that the transformation 

1^1 = Wi,---,?,) 


( 10 ) 


F, = F,(ri,...,r,) 

IS one-to-one. From the preceding assumptions that gave rise to (2) and (4), it 
may be shown that the F’s are continuous and possess continuous partial deriva¬ 
tives. In terms of the F’s, (9) may therefore be written as 


( 11 ) 


/ • ■ / exp JCi dFx . •. dF, 

= /•••/ expj^SetFfcJxsdFi .<• dF,, 

where K, = a,e^ has been expressed in terms of the F’s. 

tiiW^f ® parameters mil be defined over intervals and 6* is an analytic funo- 
tion of those parameters, to every region in the parameter space determined by 



UNIQUENESS OF SIMILAR REGIONS 


69 


intervals of the ^’s there will correspond an interval for 0j, throughout which 0* 
will be defined; consequently (11) will be an identity in the 0* for intervals of 
values. For every point within regions determined by 0), intervals, the partial 
derivatives of the two sides of (11) must therefore be equal, provided the deriva¬ 
tives exist and provided the Qi. are functionally independent. 

If the conditions to be imposed shortly are satisfied, it can easily be shown that 
it is permissible to differentiate (11) repeatedly under the integral signs with re¬ 
spect to the Qk. As a consequence, (11) implies that for all sets of non-negative 
integers ki, • ■ • , k,, 


(12J 


I .. j 7M...Ff'exp ZQKVkKt dV^ 


dV 


= I f ^ 1 ' ••• 


dlh ■ • • dV, 


will be an identity in the 0 ,l for almost all values of the 0^. But (12) is equivalent 
to requiring that 


(13) 


I ... I 

= I ... jvt^ ••yt'[72(7i,---,7,)d7i-.-d7, 


shall hold for all sets of nou-iiegative integers ki, • ‘ , k,, where gi and are 

the integrands of (11) after they have been divided by the function of the 0* ob¬ 

tained from integrating (11). Since gi and gt will then be non-negative functions 
of the 7’s whose integrals over all values of the 7's is one, they are distribution 
functions of the 7’s. If gi and gj possess moments of all orders and are such that 
they are uniquely determined by their moments, then condition (13) implies that 

(14) gi(7i,-.-,7,) =g*(7i,-..,7,). 

This identity will hold for almost all values of the parameters, If the conditions 
necessary to justify (14) are satisfied, it therefore follows that 

<xi(.Ti, ,T,) = a%(Ti, •••, T,), 

and that Neyman’s method of constructing similar regions by choosing 
aiT \, ■ • • , r,) = a yields all possible similar regions of the class of regions 
being considered. 

The conditions that were imposed on , • • • , d,) in order to establish 
uniqueness may be summarized as follows: The distribution function 
/(a:|0i, ■ • • , 6,) is analytic and possesses a set of sufficient statistics, Ti, < • ■ , T,, 
with respect to the parameters , • • ■ , 0,, that are continuous and possess con¬ 
tinuous partial derivatives. There exist one-to-one transformations of the types 
(4) and (10). The function , treated as a distribution function of the 

7’s, possesses moments of all orders and is uniquely determined by its momenta. 



70 


PAUL G. HOEL 


Finally, the 0* are functionally independent with the smallest possible value of 
n equal to v. 

If the assumption that the 9i, are independent is not realissed, the distribution 
function (1) could be expressed in terms of fewer than v parameters. This is 
also true k n < v. The two assumptions that fi = v and that the 0* are indepen¬ 
dent will therefore be satisfied if (1) is expressed in terms of the mmimum number 
of parameters. The remaining assumptions can often be checked quite easily 
whenever a particular distribution function is given. 

In deriving tests of hypotheses for certain parameters, the distribution function 
/(si 6], • • , ^) will of course contain those parameters in addition to the param¬ 
eters 6i, ,6,, hut since they will have fixed values, it was not necessary to 

introduce them into the discussion. 


4. Equivalence of methods. Although the equivalence of the two methods 
of constructing sinoilar regions has been implied in the literature [1], no simple 
demonstration seems to be available. Such a demonstration ia easily given by 
means of (3). Let 


<P< 


dlogf 

ddi ’ 


where / is given by (3) with n = v,md let 

Differentiation of (3) yields 


difii 


_ V V i_ ^0 


(15) 


Vii 






I d8i dOk 


F* n 


ddidej' 


The differential equations that are assumed to hold in the other method of con¬ 
struction [1] may be written in the form 


(16) 




r-I 


i/rlPr, 


{i,j = 1,... , 


where the Aij and Bi,r are functions of the e’s only. Upon substituting the 
values given by (16), it will be found that (16) wiU be satisfied if 


(17) 

and 


^ Q* _ T R 
dBidB, h 


(fc = 1, •. ■ , v) 





UNIQUENESS OF SIMILAR REGIONS 


71 


Since (17) represents a set of v equations in the S,/s, whose coefficient matrix is 
non-singular because of the functional independence of the 0*;, it follows that 
sets of A’s and B‘a can be found to satisfy quations (16). This shows that the 
sufficiency assumption includes the differential equations assumption. 

Now the method of constructing similar regions here consists in building them 
up as the locus of subregions of size a on the surfaces obtained by giving the <p, 
constant values. But from (15) it follows that the surface = c,(t = 1, • • ■, r) 
is equivalent to the surface 



which may be written in the form 




(i =l,-“,v) 


( 18 ) 



(t = ,r), 


because 9 is a function of the parameters only. Since the coefficient matrix of the 
V’s in (18) is nonsingular, (18) may be solved for the N’s, consequently the sur¬ 
face ip, = c,, (f = 1, • ■, y) is equivalent to the surface 1', = c" ,{i = I, , v). 
But from the assumption conceriiiiig the transformation (10), the surface 
y, = c !, (i = 1, ■ • ■, r) is equivalent to the surface T, = d!' ,v). 

Thus, the two surfaces = c, (f = 1, -, v) and T, = c ", (i = 1, • - •, r) are 

equivalent and hence the two methods of construcling similar regions aie 
equivalent. 


REFERENCES 

[1] J. Neyman, "0)1 a statistical problem arising m routine analysis and m sampling 

inspections of mass production,” Atimh of Math, Slat , Vol. 12 (1941), pp. 40-70. 

[2] J. Neyman, "Outline of a theory of statistical estimation based on the classical theory 

of probability," ifoy Soc, Phil. Trans., Yol 23eA (1937), p. 364. 

[3] H, ScHEPpi, "On the theory of testing composite hypotheses with one constraint," 

Annals o/ Math. Slat., Vol, 13 (1942), pp 280-293. 

[4] B, 0, Koopman, "On distributions admitting a sufficient statistic," Trans, Amer. Malh. 

Soc,, Vol. 39 (1936), pp, .399^09, 



NOTES 

This section is devoted to brief research and expository articles and other short item, 


CONVERGENCE OF DISTRIBUTIONS 

By Heebbrt Bobbins 


University of North Carolina 
Letfnix) (n = 0, 1, 2, ■•■) be frequency functions 

j 

(1) /«(») >0, f f„(x) dr = 1. 

»*-P© 

There are various ways m which the sequence of distributions corresjionding to 
the/„(a:) (w = 1, 2, ■ ) maybe said to converge to the distribution correspond¬ 
ing to /o(x). The definition customarily adopted in mathematical statistics' 
(see e.g. [1]) is equivalent to the condition 

(a) lim f fn(x) dx = f ft,(x) dx for every 

n*4ee J—tc 

We shall also consider the two fuither conditions 

(b) lun / fn{x) dx = / f(,{x) dx for every Borel set S, 
and 


(c) lim / /„(i) dx = I /o(s) dx uniformly for all Borel sets S. 

vfl Jr 

It is clear that (c) implies (b) and that (b) implies (a). That the converse 
implications do not hold is shown by the following examples. 

Example 1. Let/o(a:) = 1 for 0 < a: < 1 and 0 elsewhere. Choose and fix 
any 0 < e < 1, set == e/n-2’', and forn = 1, 2, ■ ■ • let /„(a) = l/n-Sn for 
ijn. — 6n < X < i/n (i = 1, 2, • ■ • , n) and 0 elsewhere. If we denote by Sn 
the set of all x for which/,, (a;) > 0 it is easy to sec that for n = 1,2, • ■ • 

(2) 0 < ~ fn(x) dx <. 1/n for every 

(3) / Mx) dx = e/2", f f„(x) dx = I. 


' Ttom a well kown theorem of P61ya the convergence ia then neoesBarily uniformfor all (. 



CONVBBGENCB OF DISTRraXJTIONS 


73 


Hence for the Borel set 5 = 2 it follows that 


(4) 

f fo(x) dx [ fo(x) dx = «, 

1 JSa 


(5) 

[ /„(x) dx = f f„(x) dx = 1, 

*'5 JSn 

{11 = 1,2, ...). 


From (2) we see that (a) holds (uniformly for all ?), and from (4) and (6) that 
(b) fails about as badly as possible. 

This construction can be modified to apply to any fa(.x) ; thus choosing foix) = 
(2fl-e“’ we can construct Jn{x) (n = 1, 2, - - ■) and a Borel set iS such that 

f ^ 1 r ^ 

lim / J„(x) dx = —7= I e“* dx uniformly for all f, 

n ee J— oQ ’X/ J— oo 

while 

Is ^ ^ = 1 . 2 , • • •)• 

It IS conceivable that some time a statistician, failing to consider such a possibil¬ 
ity, will bo led to approximate .01 by 1. 

If Xn is a random variable with frequency function iiy = g{x) is a Borel 
function, and if (a) holds, then it follows from Example 1 that the distribution 
function H„{y) of ¥„ = g{X„), equal to the integral of/„(a;) over the set/S„ of all 
X such that g{x) < y, need not converge to the distribution fimction Hoiy) of 
Yo = g{X„). It is easily seen that this possibility is excluded if, as commonly 
occurs in applications, g{x) is such that for every y, the intersection of with 
any finite interval is the sum of a finite number of intervals (e.g., if g(x) = sin s). 

Example 2. Let fo{x) be defined as in the previous example, and for n = 
1, 2, • ■ let/„(a:) = 1 + sin (2mx) for 0 < .x < 1 and 0 elsewhere. By the 
Riemann-Lebesgue theorem it follows that (b) holds. But let denote the 
set of all X for which f„(x^ > 1 ; then 

f Jo{x) dz = I, f /„(*) dx = ^ + ll-K, (n = 1, 2, ■ • ■), 

so that (c) does not hold. 

It follows from these examples that (a), (b), and (c) are successively stronger 
definitions of convergence. We shall now give some definitions equivalent to 
(b) and (c). 

First we recall that the non-negative, completely additive, and absolutely con¬ 
tinuous set functions 

P.{S) = f fM dx, 

Js 


( 6 ) 


(n = 1,2, 



74 


HEEBEHT BOBBINS 


are said to be uniformly absolutely continuous if for every e > 0 there exists a 
^ > 0 such that for any S and any n = 1, 2, • • • , 

(7) m{S) < S implies P»(iS) < e. 

We,shall denote the condition that the Pn{S) be uniformly absolutely coutinuous 
by (u.a c), and we shall now prove that (b) is equivalent to 

(b') (a) and (u.a.c.). 

Peoof. (A) Suppose (b) holds. It is clear that (a) holds, and we shall show 
by contradiction that (u.a.c.) holds also. For if not then there would exist an 
e > 0 such that for any ij > 0 we could find a set S and an integer n such that 

(8) m(S) < ,,, P„(S) > e. 

Moreover, since the set function 


Po(iS) = dx 

is absolutely continuous, there exists a S > 0 such that 
(9) m(S) < 5 implies Po(»S) < t/2. 


Now by (8) there exists an with miSi) < S/2 and a h such that Pk^iSx) > e. 
Next, there exists an Sj with m(Sj) < 5/2“ and a h such that P„,(,Si) > €, and 
it is easy to see that we may assume that fcj > fci. Proceeding in this way we 
find a sequence of integers fci < fc* < • •. and of sets Si, Ss, • • • such that 


MSn) < S/2”, P,^{Sn) >e, (n = 1, 2,...). 

Let S = I'r S„; then by (10), m(S) < i;rm(Sn) < 5, so that by (9), 

Po(S) < i/2. 

But by (10), 


( 12 ) 


hSS) > P,SS„) > e, 


in = 1, 2, ...). 


(B) Suppose (bO holds. We shall show first that (b) holds for any set 


^ ™Pljes P„(5) < e/8 (n = 0, 1, 2, • • ■). 

Itj knoTO from the theory of measure that corresponding to >Si and to 5 we can 
M . set & wh,ch . the sum rf . fbite aumta o? dfsioiat wlrvaMucrtta 

^14) m((Si 


Ss) + (Sj — 8i)) < 5. 



CONVERGENCE OP DISTRIBUTIONS 


75 


From (13), (14), and the relations 

(15) Pr^iSl) = PniS,) + P„{S, - S 2 ) - Pn{p2 - Si), {u = 0, 1, 2. ■ ■ •), 
it follows that 

1 PoiSl) - PM I < I PM - PM I + PniSi - &) + Pn(& - Si) 
+ PoiSi - S 2 ) + Po(Ss - Si) < 1 PM - PniSi) I + 6/2, 

and from (a) that for large enough n, 

(17) 1 PM) - PM) I < e/2 

Thus from (16) and (17) it follows that for large enough n, 

1 Po(Si) - PM) I < 6, 

which proves (b) for the case m{S) < oo. 

Now given any £ > 0 choose a, ^ so that, setting d = {a < a: < /I), we have 

(19) Po(d) > 1 - £/4. 

Then it follows from (a) that for large enough n, 

(20) PM) > 1 - «/2. 

Then for any Borel set S we have for large enough n, 

PM) - Po(5) = PMA) + PniS -A) - P{SA) - P(S - A), 

I PM) - PM) 1 < 1 PniSA) - PoiSA) I + PM - d) + PoiS - A) 

< 1 PMA) - PMA) 1 + e/2 + e/4. 

But by the previous case, since m{SA) < oo, for large enough n we shall have 
I PMA) — PoiSA) I < e/4. Hence for large enough n, 

I P„(iS) - Po(<S) I < 6, 

so that (b) holds in this case also. This completes the proof. 

We shall say that lira /„(*) = foix) in measure if for every e > 0 and for 

every set d such that m(d) < m , the measure of the set of all a: in d for which 
I fnix) — foix) I > e, tends to 0 as n increases. (For a space of finite measure 
this reduces to the usual definition.) We now observe that (c) is equivalent to 

(cO lim/n(a:) =/o(a:) in measure. 

n“*oe 

In fact, it is easy to show that (c) is equivalent to convergence in the mean of 
order 1, 

(c'O lim f I f„ix) - foix) I da: = 0, 

n—^eo *f-‘eo 



76 


Z. W BIRNBAUM 


which implies (cO, and a theoiem of Scheffd [2] states that (o') implies (c)/ 
Finally, it is not hard to show that the condition 

(d) lim/„(a:) =/o(a;) almost everywhere 

n-+eo 

implies (c') but not conversely 

Summing up, we arrive at the following complete set of implication relations 
among the various modes of convergence which wo have cimsiflered: 

(20) (d) (c") ^ (c') (c) ^ (b') (b) (a). 

EEFERENCES 

[1] H. Cbam:6iIi Malheinakaal Alcthoda of Stalislica, Pnucetoii Univ. Prosa. 19-t6, pp. M-tW. 

[2l H. ScHEFPfi, “A useful coiivergenoa theorem, for probability diatributiona,” Annah of 
Malh Stat., Vol, 18 (1947), pp, 434-438 
l3] B. J. McShanb, Integration. Pnnoeton Univ. Press, IQt-l, p, 168. 


ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 


Bt Z. W. Bibndaum 
Universit/y of Washinglon 


The quality of a distribution usually referred to as its peakedness has often 
been measured by the fourth moment of the distribution. It is known, however, 
that there is no definite connection between the value of the fourth moment and 
what one may intuitively consider as the amount of peakodness of a distribution ^ 
In the present paper a definition of relative peake3ness is proposed and it is shown 
that this concept has properties which may make it practically applicable. 

Definition. Let Y and Z be real random variables and Yi and Zi real con¬ 
stants. We shall say that Y ts more -peaked about Yi than Z about Zi if the in¬ 
equality 


P(\Y -Yi\^T) ^P{\Z - Zi\^ T) 

is true for all T > 0. 

If, for example, Y and Z are normal random variables with expectations Yi 

and Zx and standard deviations v, and <r., and if <r, < <r,, then F is more pealmd 

abwt Fi than Z aboirt Zx. Similarly, if F is a random variablo such that 

wl+h P(7 - ^ D/J ~ if is the discrete random variablo 

7 ohI,feh “ ~ ^ ^ than 

io anout the same point. 


whS2 L! convergence theorem on 

wmen me proof xa based holds for convergence in meaaure (see e g. [3]). 

(1945),kurtoaiB,” Am. Stat Assn. Jour., Vol, 40 



ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 


77 


Lemma Let Fi , F2 , he cmiUnuous random variables^ with the probability 

densities viiYi), ipiiYi), fiiZi), / 2 (^ 2 ) such that 

1°. Fi and Y% are independent, Zi and Zi are independent, 

2“ <p,iY,) = Y,) for all Y,, UZ,) = /,(- Z,) for all Z,, {i = 1, 2), 

S”. pi{Y'i} and fi(Zi) are not-increasing functions for positive values of the vari¬ 
ables, and 

4°. F, is more peaked about 0 than Zi, for i = 1,2. 

Let F = Fi + F 2 and Z = Zi -Y Z^. Under these assumptions Y is more peaked 
about 0 than Z. 

Proof: Let ^fy) = P(F, ^ y), F,iz) = P(F, ^ «), for i = 1, 2, be the cumula¬ 
tive probability functions. For any random variables Fi, F 2 ,Zi ,Zi (not nece.S' 
sarily continuous) which fulfil assumption 1° we have, for any T, the relation- 
.ships 

P(F ^ T) ~ P(Z S T) = f [#i(f - s)d$2(s) - Pi(3’ - s)dF2(s)] 



= I MT 

«/—00 

= f [4-i(P 

J—aj 


- s) - Fi(T - s)]d^t(s) 

+ f^FiiT - s)[d<b2(s) - dP2(s)l 

- s) — Fi(T - s)]d$2(s) 

- r [$2(s) - F,{s)W\(T - s) 

*L-oo 

- s) - FliT - s)]d$2(s) 

-b f” mr - s) - F,{T - s)]dFM 

J—(A 


whore 


etc. 


= L(T) -f hiT), 

lAT) = /’"bfiCP - s) - Px(P - ,s)]d# 2 (.-!) 

J—ec 

= [ [4>i(-s) — Fi(—s)]d<h(T -f s) 

J—«3 

== f {[Pi(s) - #i(s)]d4>2(P - s) 

Jo 

-f [cM-s) -Pi(-s)]dct,(r-|- s)}, 


* As defined e.g. in H, Cramer, Maihemalical Methods of Statistics, Princeton University 
Press, 1946, p. 169. 



78 


Z. W. BIRNBAUM 


If the random variables have distributions symmetrical about zero (assumption 
2°) this is equal to 

\\P{Zi g s) - P(7i g s)]dP{Yi ^T~s) 

Jo 

+ [P(F, g -s) - P{Zi g s)]dP(Yi ;g 7’ + si) 

= r {[1 - P(Z^ > s) - 1 + P(Yi > s)]dP(F, ^T~s) 

Jo 

+ [P(7i ^ s) - P(Zi ^ s)\dP(Y, ^T + s)} 

= r" {[P(7i ^ s) - P(7i g s)MP( 72 g r + s) + P(ri ^ P - s)] 
Jo 

- [P(7i = s) - P(Zi = 8)]dP(7i ^ r - a)) > 

and we obtain 

I,(T) = [P(ya g s) - P(Zi k s)]d[P(F 2 g P + s) 

( 1 . 1 ) 

+ PCFi g P - s)] - 


/ [P(Fx = s) - P(Fi = s)]dP(Fj g P - a). 

Jo 


By an analogous argument one derives the equality 


( 1 . 2 ) 


Ii(T) = r [P(F 2 ^ s) - P(F 2 s)]dP(Fi ^ P + b) 

Jo 

+ p(2i ^ p - s)] - [P(F 2 = s) - P(Z2 = s)]dP(^fl g P - 8). 

Jo 


Making use of the assumption that Fj, Fo, Zi, Zj, are continuous random vari¬ 
ables, we conclude that the second integrals in (1.1) and (1.2) are zero, and we 
may write 


(2.1) Z:(P) = r 
Jo 


[P(Fi S b) - P(Zi ^ s)][¥, 2 (r + s) - ^(P - s)]d8, 


(2.2) J 2 (P) = [ [P(F 2 ^ s) - P(Zi ^ s)][/i(P -1- s) - fi(T - s)]ds. 

t/O 


For P ^ 0 we have, making use of assumption 3°, 

<P2(T + s) - ¥^(P - s) g 0 if 0 g s g P 


¥^(P -t- s) - piCP - s) = <p2is + T) - ^i{s -P)g0if0^rgs, 
and similarly 

/i(P + s) — /i(P — s) g 0 for all P ^ 0 and s S 0. 

Since according to assumption 4° we also have 
P(Fi k s) - P{Zi ^ s) g 0 
PiYi ^ s) — P(Z 2 ^ s) g 0 for s ^ 0, 



ON EANDOM VAEIABLEB WITH COMPABABLE PEAKEDNES3 


79 


both integrands in (2.1) and (2.2) are non-negative for all values of s, and we 
conclude 

P(Y ^ T) - P(Z g 7*) = Ii(r) -f T,(T) S 0, 

and hence 

(3.1) P(7 S T) - P(Z k T) ^ 0 for P g 0 

From assumption 2° one easily sees that Y and Z have symmetrical probability 
distributions. This together with (3.1) leads to 

P(Y ^ T) - P(Z g T) = P(Y ^ - T) - P(Z ^ - T) g 0, 

and thus to 

P(l Yl ^ T) - PQZl i P) g 0 for r ^ 0. 

As can be seen from (1.1) and (1 2), the assumptions of the Lemma, in par¬ 
ticular the assumption that all variables are continuous and the assumption 3°, 
are rather special sufficient conditions for Y being more peaked about 0 than Z. 

Theobem 1. Let Y and Z he continuous random variables mth probability 
densities (p{Y) and j{Z) such that 

l". v(- Y) -= <piY) for all Y, fi- Z) = fiZ) for all Z, 

2°. (fi{Y) and f{Z) are not-increadng functions for positive values of the variables, 
3®. 7 is more peaked about 0 than Z. 

Let 7i, Fj, ■ • • , 7„ and Zi, Zz, • • • , Znhe random samples of 7 and Z, respec- 
_ 1 ” 1 

lively, and Yn = -^Yj,Zn = -^Z,.—Then 7„ is more peaked about 0 than 
_ n n ,_i 

Zn. 

Proof From the preceding Lemma one concludes by simple induction that 
7' = 7i -b 7j -f • ■ ■ -b 7„ as well as Z' = Zi + Z^ + Zn are continuous 
random variables with distributions symmetrical about zero and probability 
densities not-increasing for positive values of the variables, such that Y' is more 
peaked about 0 than Z'. From this the theorem follows immediately. 

The conjecture that assumption 2® of Theorem 1 might be superfluous is in¬ 
correct as may be seen from the following example: 

Let 7 be any continuous random variable with a distribution symmetrical 
about zero and such that P(( 7 | > a) = 0 for some a > 0. Let Z be the dis¬ 
crete random variable with P{Z = — a) = P(Z = a) = We have for 0 ^ 
T ^ a 

P(| F I ^ T) ^ 1 = P(| Z I g T), 

hence 7 is more peaked about 0 than Z. If 7i, Fa and Zi, Zj are random sam¬ 
ples of size 2, we have 

P(JZ, = -a) = P(2, = a) = i, P(^* = 0) = i 

and thus 

P(| Zj I S T) = I for 0 < P g a 



80 


Z. W. BIRNBAUM 


The random variable Y 2 is continuous, with a distribution symmetrical about 
zero, such that P(| Fj [ g a) =1- There exists, therefore, a Ti such that 
0 < Ti ^ 0 and that P(| Fa | ^ Ti) = f It follows that 

P(| Fa I g 2'i) = f > ^ = P(| Za I ^ Pi). 

hence Fa is not more peaked about zero than . The random variable ia 
discrete, hut it can be approximated by a continuous random variable with a 
U-shaped probability density, so that all the probabilities will be inorlified only 
very shghtly and Fa stiU will not be more peaked than ^2 • Nothing will change 
in this example if one assumes that Y fulfils condition 2° of Theorem 1. 

Theorem 2 Let Y be a continuous random variable such that 
1°. v>(- F) = <p{Y) for all Y, 

2°. <p{Y) IS a not-increasing function for F > 0, 

3°. P(1 F 1 > a) = Ofor some a > 0. 

1 ” 

Let Yi, Fa, ■ ■ ■ , F„ 6c a random sample of size n and F„ = - 2 F/. Then, 

n ,_ii 

for any 1 / ^ 0, ice hojve 

(4.1) P(| Y„\^y) , 

where 


(4.2) 


’*'«(<) = - S 

n (n/a)((+l)<Asn 






Proof. Let Z be the random variable with uniform distribution in the 
interval — 1 S Z g 1. If Zi, Z 2 , • • , Z„ is a random sample, then Z' = 
Zi + Zj + • • -f Z„ has the cumulative probability function’ 


= 0 , 

P(Z'gs) = i y\ +’I 

n\ .g(,+n)/a 

= 1 , 




1 ^ 

and Zn = - has the cumulative probability function 


2 < —n, 
—n g z g n, 
s > n, 


= 0 , 


P(Z„ g f) = A 
nl 

= 1 , 


Z (-!)’(” 

n\ is(7i/2)(f+i) ^ \i 


^ (f + 1) 


f < ~1, 
- i]. “1 ^ r g 1. 
r > 1. 


' expression is due to Laplace. For derivation and discussion, see: J. V. Usnenskv 
InMuclion to Mathematical PTohability, McGraw-Hill, 1937, p. 279, and Cramdr, op. cit!’ 



RANDOM NUMBERS 


8L 


Thus, 


P(| t) = 2[1 - P(^„ g t)] 

£ (-1)'(”')[? 
[ n! jg(»/})«+i) \^/L2 


{t + 1) - i 


and in view of the identity 




tins becomes 


n' tn/2)(t+l)<*!Sn V/C; 




for 0 ^ i g 1. The random variable - is obviously more peaked about zero* 

a 

Y 

than Z. Since - and Z fulfil the assumptions of Theorem 1, it follows that 
a 


i IS more peaked about zero than , that is 


P 


(I 


Y- 

a 



^ 0 = ^»(0 for i ^ 0. 


Setting at = y, one obtains (4.1). 

For n —> 00 the function approaches asymptotically the probability 
F (1 X I ^ is/Sn) for the normalized normal random variable X* For n = 8- 
one obtains the following values which indicate a good approximation: 


i 

.3998 

.5264 

.6711 

p(.\x\^ iV^) 

,05 

.01 

.001 

Ut) 

.049 

.0092 

.0005. 


For smaller values of n, can be easily computed. 


A METHOD FOR OBTAINING RANDOM NUMBERS 

By H. Bubkis Horton 
Interstate Commerce Commission 

The need for large quantities of random numbers to be used in sample design^ 
subsampling, and other statistical problems is well known. Tippett’s [1] num¬ 
bers have been widely used for these purposes, despite criticism directed at 
their lack of randomness The following procedure may be of interest to those 


* Crum^r, op. oit., p. 245 



82 


H. BTIEKE HORTON 


who wish to develop their own random series. The method desenbed l)elow will 
ultimately be used to record extensive tables of random numbers for general use. 

Current methods of producing random number's usually detrend upon siuglo 
operations of mechanical or electronic devices. 'I’he.se may be described as 
“single-stage” random number processes. The numerical results arc biascHl to 
the same extent as the devices from which they are taken. 

At this point it is desirable to describe a process wliich may be Calleil “com¬ 
pound” randomization. Assume two roulette wheels arrangt'd in series so that 
the first controls the arrangement of S 3 ncnbol 8 on the second wheel, while a turn 
of the second wheel determines which of its positions is to lie ohaerved. If the 
decimal system is used, the first wheel would have 10! “equally likely" positions, 
and the second would have 10 "equally likely” positions. If three such U'lieela 
were to he chained, the first would require (10!)! positiou.s, the second 101 po.si- 
tions, and the third 10 positions. In general, if a wheels were to be chained, 
the first would require 10(0””* “equally likely” positions. It is not practical 
to design such a machine.' 

One method of surmounting these difficulties is to shift to the binary system 
in order to take advantage of the fact tliat 21 = 2; or, in general, 2(!)" » 2. 
This property makes feasible the chaining of any number of machines in scries; 
and, furthermore, the machines can be of the same design. If de.sirecl, the re¬ 
sults taken from a single machine may be chained, Another important feature* 
IS the ease of handling binary chains by electronic, systems. 

The words “equally likely” have been placed in quotation marks thus far to 
indicate that the probabilities are as nearly equal as manufacturing precision 
permits. Any simple single-stage device will have some bias, and it is this very 
lack of true equality that the chaining process is designed to meet. For con- 
vemence we may take as our binary symbols -j-l and — 1 rather than the custom¬ 
ary 1 and 0. We adhere to the usual rules regarding the sign of a product. 

Let Pi be the probability of obtaining -j-l in the i** trial (or in the maohine 
of a chain of machines). 0 < p, < 1. g. = 1 — pi represents the probability 
of obtaining -1 in the f"* trial. 

Let P, be the probability of obtaining -1-1 as the product of i trials. Qi = 
1 - Pi is the probability of obtaining -1 as the product of i trials. The follow¬ 
ing relationships can be set down immediately; 

Pi = Pi Qi = qi 

Pt = PvPi -f Qrgj Qi = Pj.gj + Q^.p^ 

Pi = P!-P3 4- Qs-gj Q, = Pj-g, + Qypi 


P' = P'-i-P' + Qi ^ Pi-i-qi + 0,_i 


•Vi 


i It has been pomted out by Dr, Goorga W. Brown that a practical solution is nossible 
Using any number base, n, by addition of random digits (0,1, 2, ... n - 1) m^uloT ^ 



RANDOM NUMBERS 


83 


We may calculate the bias, Pk — 2 , for a chain of k trials: 

Pk - i = h(Pk - Qk) 

= iiPk-vPh + Qk-i-qk — Pk-i Qk — Qk-i-Pk) 

Factoring, we have 

Pk i ~ h^P fc—1 Qfc— 1 } (pfc (zO 

Substituting for P*_i — Q/,_i and factoring again, 

Pk — ^ = h{Pk-i ~ Qk-2)ipk-l — qk-l){p/i - Qk) 

Continuing the process of substituting and factoring, we obtain 

Pk - \ = i(pi - qi)(Pi — qk) ■■■ ipk — qk) 

(1) it it 

f’t - j ^ n (p< - 3.) = N n (2p. -1). 

We may write the general formula for Ph : 

(2) Pt = i[l +ft(2p. - 1)]. 

In the special case where all the p, are equal to a constant, p, 

(3) Pk = ill + (2p - 1)']. 

This can also be derived directly by expansion of (p — q)’’. 

If any machine, r, in the chain has no bias {pr = i, exactly), the chain itself 
has no bias, since 2pr — 1=0. Note also that if for all i, 0 < p< < 1, the bias 
of the complete chain is less than the bias of any component (single or multiple) 
taken from the chain, because | (2p» — 1) 1 < 1. Or stated another way, the 
results taken from any machine, no matter how nearly perfect, can be improved 
by chaining with another machine, no matter how biased the latter. Even in 
the limiting case, p = 1 (or 0), the magnitude of the bias remains unchanged; 
in all other cases it is reduced. The bias of final results can be made as small as 
desired by increasing the length of the chain. Compound randomization can be 
regarded as an attrition process which may be used to reduce final bias below 
any preassigned quantity. If the observations taken from two machines in the 
chain should be perfectly correlated, the only effect is to shorten the chain by 
two. 

In shifting from the binary system to the decimal system, symbol bias will be 
introduced. In general, symbol bias will be introduced in passing from a given 
positional system to any other positional system, unless one of the number bases 
is a rational power of the other. 

To illustrate, let us assume that we have a random binary series and wish to 
obtain a random one-digit decimal series. It will be necessary to tabulate the 
binary series in blocks of four symbols. The quantities will range from 0000 
(binary) to 1111 (binary), or from 00 (decimal) to 15 (decimal), with equal 



84 


H. BUEKE HOBTON 


probabilities. There would be no predominance of either ones or zeros in the 
overall binary tabulation, as illustrated in the table below. 


Binary System 


Tabulation to this point 


Overall tabulation 


0000 

0001 

0010 

0011 

0100 

0101 

0110 

0111 

1000 

1001 


25 zeros 
15 ones 


1010 

1011 

1100 

1101 

1110 

nil 


32 zeros 
32 ones 


IJocimal SyaLera 

0 

1 

2 

3 

4 

5 
0 

7 

8 
9 


One of eaeli syraluil 


10 

11 

12 

IS 

14 

in 


(Right digit only) 
0-5, 2 each 
0-0, I each 


However, if we look at the right digit of the decimal tabulation, it is clear that the 
symbols 0 to 5, inclusive, will occur twice as often as the symbols 6 to 9, inclusdvo. 
The easiest way of comcting for this bias is simply to reject all two-digit decimal 
numbers which occur, thereby giving equal probabilities to the ten decimal sym¬ 
bols. The rejection could be accomplished moat easily by electronic devices 
operating on the binary numbers. All numbers greater than 1001 (binary) 
would be excluded through the operation of a simple four-stage electronic 
counter. 

This simple illustration also demonstrates the inefficiency of converting ran¬ 
dom four-digit binary numbers to random one-digit decimal numbers. 37.5% 
of the data are lost in the process of removing bias. A more efficient procedure 
would be to tabulate the random binary series in blocks of ten digits. The 
largest number that could occur would be 1 111 111 111 (binary), or 1,023 (deci- 







ERROR IN INTERPOLATION 


85 


mal). The numbers would have equal probabilities insofar as this is attainable 
by chaining. To obtain a random three-digit decimal series it would be neces¬ 
sary to reject the numbers above 999 (decimal). This would amount to only 
2.34% of the available data. As before, rejection could be accomplished easily 
in the binary series by use of a ten-stage electronic counter 

Several promising devices arc being considered for tabulating random numbers 
in accordance with the principles discussed herein Electronic or electrical 
systems actuated by cosmic rays seem to be the most desirable. Tabulating 
equipment may be wired to turn out random mimbeis, possibly as a by-product 
of other card runs. 

If only a few random numbers are needed, they can be obtained by much 
simpler methods. For example, a com may be tossed, letting heads and tails 
represent -HI and —1, respectively The product of k successive tosses would 
be tabulated as the random binary variable. Products equal to -f-l and — 1 
would be coded as 1 and 0, respectively. Blocks of binary symbols would tlien 
be converted to the decimal system as described above. 

REFERENCE 

[1] XiprETT, L. H. C., Random Sampling Numbers, Tracts for Computers, No. 16, Cambridge 
University Press, 1927. 


NOTE ON THE ERROR IN INTERPOLATION OF A FUNCTION OF TWO 
INDEPENDENT VARIABLES 

By W. M. ICincaii) 

Uiiiversiiy of Michigan 

Suppose that p is a functon of one real variable x and h is an interpolation func¬ 
tion such that g{x) = h{x) iov x = xi, ocs, • • • , x„ . Let fix) = gix) — hix) 
d" 

and suppose that ^ fix) exists in an interval containing the points Xo ,xi, ' ■ • , 

x„. Then the error in interpolation may be estimated from the well-known 
relation 

(1) fM = iXo - Xi)iXo - Xi) • •• ixo - ,T„), 

where ^ is some point in the smallest interval containing xo, Xi, ■ • • , Xn. 

In the most usual case, where Hx) is a polynomial of degree less than n, wc 
have /«»)(?) = g’'”\^). 

It is natural to consider the corresponding situation for functions of two inde¬ 
pendent real variables x and y. Let g and h be two functions such that gix, y) = 
hix, y) for n points x = x, ,y = y,ii = 1,2, • ,n). Setting /(a:, y) = gix, y) — 

hix, y) as before, we have /(»,• ,yi) = 0 for i = 1, 2, • • • , n. Then if (*0 , yt) 



86 


W. M. KINCAID 


is a point at which g and h are defined, we may ask whether there is any formula 
corresponding to (1) from which the error /(xo, ye) can be estimated. 

Some restrictions must be placed upon the function f if any interesting results 
are to be obtained. Let us suppose that fix, y) can be expanded in a Taylor 
series about each of the points (x<, j/0(i = 0, 1, ■ • • , n) with a region of con¬ 
vergence sufificient to include all the points of the set. These eonditioas arc more 
stringent ones than will be required for obtaining the later results; on the Other 
hand, they would almost always be satisfied in any practical problem of inter¬ 
polation, so it scarcely seems worthwhile to look for the weakest possible con¬ 
ditions at this point. 

The first case of real interest is n = 3. It follows from the general statement 
of Taylor’s theorem with the remainder that 

0 = f(xi , yd = fixe, ye) -f- ixj - Xo)Mxa, Ve) + Cl/. “ vdhixe , Ve) 

(2) + i[(x. - , rid + 2ixi - Xe)iVt ~ 2/o)/xu(£., V<) 

+ (y. - vo)"U^^, 1?.)] a = 1. 2, 3), 

where (|,, ijj) is a point on the line segment joining (xo, Ve) and (x;, yd for i =» 
1,2,3. 

The equation (2) may he regarded as a set of three linear equations in the two 
quantities /x(xo, ye) and fyixe, ye) . The condition that those shall be consistent is 

/(®o) ye) UI Xi — Xe 2/1 ■— ye 

(3) /(®o, Ve) 4- Ui Xi — Xe yt ~ ye = 0, 

fixe, yd + Ua Xi - Xe y% — ye 

where 

Vi = — Xe)%iQi, vd 4- 2ixi — Xe)iyi — 2/o)/r»(?<, vd + iVi ~ ye)*finii'^i, n<)] 

(t’= 1,2,3). 

If the three points (x<, yd (f = 1, 2, 3) are not in a straight line, (3) can be 
written in the form 


( 4 ) fixe, ye) = - 


This expression is analogous to (1), though far less simple and elegant in form. 
A similar treatment can evidently be used in all cases of the type n = 

2 


Ui Xi - Xe 2/1 - 2/0 

Ui Xi — Xe 2/j — 2 /d 

Uj Xj - Xe ye — ye 

1 Xi 2/1 

1 Xi 2/2 

1 Xe 2/3 



ERHOR IN INTERPOLATION 


87 


For example, for n = 6 the equation corresponding to (4) is 

7i Xi — a:o 2/i — 2/0 (xi — Xo)* (xi — Xo)(2/i — 2/o) ilh — VoY 

Vi Xi — Xo 2/a — 2/0 (xa — Xo)“ (xa — Xo)( 2/2 — 2/o) (2/a — 2/o)* 

Fs Xo — Xo 2/3 — 2/0 (xa ^ Xo)“ (xa - Xo)(2/3 — 2/o) (2/8 — 2/o)* 

X4 — Xo 2/4 — 2/0 (x4 — Xo)^ (x4 - Xo)( 2/4 — 2/o) ( 2/4 “ 2 / 0 )^ 

Fo X6 - 2/0 2/« — Vo (xa — Xo)“ (xb - Xo)(2/6 — 2/o) ( 2/6 — Vof 

F« x« — Xo 2/0 — 2/0 (xa — Xo)“ (x# — Xo)(2/t— 2 / 0 ) (2/« ~ 2/o)^ 


(6) /(xo,2/o) = - 


1 rci 2/1 Xi X12/1 2/1 
1 Xa 2/2 Xa X22/2 2/1 


1 xs 2/3 Xa Xs 2/3 2/3 
1 X 4 2/4 X 4 X42/4 2/4 

1 Xb 2/6 Xb XbI/b 2/6 
1 XC 2/6 Xo X 62/6 2/6 


where 

F, = i[(x, - xa)%xi^i, a?.) + 3(x. - xofiy, - 2/o)/.*v(f., 

•|-3(xi 2/o)/ivv(?t) Vt) “h ( 2 /* 2 / 0 ) /wi/(^») *?»)] (f ~ 2, • • • , 6). 
(Equation (5) breaks down only if the six points (xi , 2 / 1 ) • • • (xj, 2 / 0 ) lie on a 
single conic.) 

As an example of the general case we may consider n = 4. We write 
/(x<, 2/<) = /(xo, yo) ■+ (xi - xo)/i(xo, 2 / 0 ) + ( 2 /. - 2/o)/i/(xo, 2 / 0 ) 

■hM(Xi Xo) fxxi^i ) yi) ~t" 2(xf Xo)(2/i 2/o)/*v(f< > '/i) 

+ ( 2 /i “■ 2 /o)Vw/(l»» ’!•)] (f = 1> 2, 3, 4). 


Now, 

/m(^, , IJi) “ fxxixo , 2/0) " 1 “ (^t Xo)/*ix (^4 , ^i) " 1 ” (^t yo)fxxv{^t I 

where , »;<) is a point on the line segment between (xo , 2/0) and (?<, vi). 
Proceeding as before yields 


( 6 ) 


/(xo, 2/0) 


TFi xi — .Xo Vi — yo (xi - xo)“ 

TFa Xa - Xo 2/2 “ 2/0 (Xa - Xo)® 

Wj Xa — Xo 2/» — 2/0 (xa - Xo)® 

IF 4 X 4 — Xo 2/4 — 2/0 (X4 - Xo)® 

1 Xi 2/1 x® 

1 Xa 2/a Xa 

1 .Xa 2/3 X 3 

1 X 4 2/4 xj 



88 


KAl-hkl CHUNG 


wilh 

W'i — ~ ~ , v't) + (®* ~ (v> ~ I 

+ 2(x^ — 'Co)(v, — 2/o)/™(St , Vi) "H (?/> ~ ) »)()1- 

Corresponding formulas can Ije derived in this way for any value of >/; in fact, 
several alternatives may be obtained in each case. In all canes tile ('rror f(xo, 
is given in terms of the derivatives of ff alone if a polynomial of a cerlaui ty|Ki is 
used for the interpolating function. For equation (4), the suilable polynomiul 
would heh(x,y) =la + bx + c^, for (5), ft(a:, 3 /) = a + bx + c?/ + da;» + er 3 / + ; 

for ( 6 ). h(x, y) = a + bx + cy + dx\ If the interpolating funclion Mr, y'l 
is not so chosen, the foimulas remain valid, but derivatives of h will iqipear. 

The same procedure is applicable to functions of any number of independent 
variables. 


ON A LEMMA BY KOLMOGOROFF 

By Kai-Lai Chung 
Princeton University 

The following lemma was proved by Kolmogoroff [1]: 

If Cl, 62 , • , Cn are independent events and U an arbitrary event tench that 

,(Tf(X) denoting the prohahilily of X and W,{X) the conditional probdinlity of X 
under the hypothesis of e) 

& u, F(ei + • * • + e«) § «• 

Then 

W{U) ^ W. 

This result seems of some interest in itself and may also have practical applica¬ 
tions, for it IS easily seen that [2] in general if Ci, e 2 , • ■ ■ , e„ are arbitrary no 
information about Wti+ +»„({/) can be obtained from that about Tf'«*(I7), 
fc = 1, ■ ■ ■ , n. From this point of view the constant 1/9 is interesting, though 
it is unimportant m Kolmogoroff’s proof of the law of large numbers. Using his 
original method this constant can easily be improved to 1/8. However, the fol¬ 
lowing method will give a better result. At the same time we shall put it into 
a more general form. 

Let 


E 


iUl 


W (e*) ^ /3. 


W.,{U) a, 



ON A LEMMA BT KOLMOGOROEP 


89 


Then wc have for 1 g /c g n, 

(1) W{V) ^ Wmei + ... ffc)) = W{Ue^ + ■ ■ ■ + Ueu). 

Now a simple case of certain inequalities due to Bonferroni and Frechet [3] 
states that for arbitray events Ei, , E^we have 

(2) W(Ei + ••• +Ek)^'Z W(E,) - E W(EtE,). 

•-1 

Applying this to (1), we obtain 

W(.U) ^ E WiUe,) - E WiUae,) 

i-il l£Kj£Ie 

^ E medW,,(U) - E Tr(e..)Tr(c/), 

1-1 

using the independence of Ci, • • • , . lienee 

W{U) kat ^VU^) - 5 (e TF(<’i)y + 5 E TF“(e.). 

t»i A V -.1 / 6 t=i 

By Cauchy’s inequality, 

,-(e mei)Y. 

t-l K \,„i / 

k 

Writing = S W'Ce,), we havo 

t-l 

(3) 

Now let 0 < 7 < 7 o ^ 1 where 7 and 70 are to be determined later. If there is 
an e,, 1 < i < w such that Tr(e,) ^ 718 , then 

(4) Wm g F(?7c.) = W{c,)W>,{U) ^ 7a/3 
If every TF(e,) < 7 / 3 , we determine /.(> 1) such that 


thus 

And (3) yields 



(5) 



90 


KAI-I/AI CHtTNG 


Now we choose 7 so that the last terms in (4) and (5) be equal. This gives 

2a - (1 -070^ 

= _J ■ I >Va 


7 = 


2a + 


04 ) 


70/3 


To maodmize 7 , we put — = 0 and find 

“70 


7o 


_ 2(-v/2 - l)a 
|8 


If 2 (■\/2 — 1) a ^ j3, this choice of 70 is admissible, and we obtain 


7 = 


V2 - i (V 2 - 1) 


Thus we get (the first inequality being retained for small values of n) 

2 - V2 + - (\/2 - 1 ) 


(ii) 


W{U) ^ 


V2 - 1 (V2 - 1) 


2(V2 - l)a* 


^ 2(V2 - 1)V > 

I n case 2('\/2 — l)a > ^, we choose 70 = 1, and we obtain 


0-0'’' 


2a + 


Thus we get 


W{U) § 


2a 


-0-0 


2a + 


0-0 


■ a/3 


^ 2a — fl 


If we write ^ = ija, we have 


(7) 


If (Cl) ^ 

"T IJ 



APPEOXIMA.TE WEIGHTS 


91 


We summarize (6) and (7) in the following table; 


/3/a 

^ 2(\/2 - 1) 

= y < 2\/2 - 1 

W{Xf) 

S 2 (-v/2 - 1)V 

... 2 — 5 

-2 4-/“ 


Thus for Kolmogoroff’s case (v = 1) we have 1^(17) ^ 

REFERENCES 

[1] A. Kolmoqobopp, "Bemerkungeu zu meiner Arbeit ‘tlber die Summen zufklliger 
Grfissen’,’’ Math. Annalen, Vol. 102 (1929), pp. 434r48S. 

[21 K. L, Chung, “On mutually favorable eveuta,” Annals of Math. Stat , Vol. 13 (1942), 
pp. 333-349. 

[3] M. Fb^chbt, Les prohahiliiUs assocMs A un systbme d’bvbnements compatibles et dkpen 
denis, Premifere partie, Hermann, Paris, 1939, p. 69. 


APPROXIMATE WEIGHTS 
By John W. Tukey 
Princeton University 

1, Summary. The greatest fractional increase in variance when a weighted 
mean is calculated with approximate weights is, quite closely, the square of the 
largest fractional error in an individual weight. The average increase will be 
about one-half this amount. 

The use of weights accurate to two significant figures, or even to the nearest 
number of the form: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 
32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, or 95, that is 
to say, of the form 10(1)20(2)50(5)100 X lO"” can thus reduce efficiency by at 
most \ percent, which is negligible in almost all applications. 

2. Proof. Let the optimum weights be Wi, f = 1, 2, ■ ■ • , n, with Wi > 0, 
where it is convenient to choose the normalization SWi = 1. Let be the 
variance of 'SWiXt, then tho variance of each X{ must be alW ,., and since this 
is a weighted mean, the means of the x, are the same. 

Let the approximate weights be Tf<(l + X0<), where 0 < X < 1 and \0{\ < 
\,i = 1,2, • ■ ■ , n. Thus X is the largest fractional error which may be made 
in the situation considered. We need the weak requirement X < 11 The ap¬ 
proximately weighted mean is 

^ W.(l -f X^Jgt = V w ^ "I* 

S TF.d + X0.) ^ *' 1 + X0 ’ 



92 


JOHN W. TUKBY 


where # = . Its variance is 

2 1^2 /'l + xA' 


1 - AS/ F, 


- 'i' +1-^5^ - *' + (rh8)i2 »',(»< 

(Z F,fl 5 ) - r 


= ^ (i+x§)^ 

and, since SF,e? < 1, this is bounded by 


crNl+X’ 


(1 + X0) 

Now the only maximum of this expression for i j < 1 oeeurs when 
and the bound becomes 


-x. 


" 0 + r^=)^r^xv 


This proves the first statement in the summary. 

The greatest fractional change which occurs wlien a number is approximaled 
by one of the form 10(1)20(2)50(5)100 X lO"^ is 5/105, which occura, for ex¬ 
ample, when 10.499999 • ■ • , is replaced by 10. The sumo estiiuale applic.s to 
an approximation to two significant figures. Tlie variance is tluis multiplied 
by a factor bounded by 

1 + iostl-. S 


which proves the second statement. 

The use of a weight of the simpler form 10, 15, 20, 30, 40, 50, 70, times a 
power of ten is seen in the same way to lead to an increase in. variance and a 
decrease in efiicienoy of at most 4J percent. 

3. Comment. It is interesting to compare the 90 possible values for 2 eig- 
mficant figures, the 35 possible values for the numbers proposed above, which 
might be called two cuTto/iled significant figures, and the 24 possiblo values for 
logarithmic spacing at interval (1.06)*, all of which extend over one power al 
ten with the same maximum fractional error in rounding. The use of the cur¬ 
tailed scheme for critical tables of weights and weighting coefficients would save 
more than 60 percent of the entries needed for two complete significant figures. 

This device applies equally well to other numbers of significant figures. 



USE OE NON-CENTKAL i-DISTEIBUTION 


93 


ON THE USE OF THE NON-CENTRAL <-DISTRIEUTION FOR COM¬ 
PARING PERCENTAGE POINTS OF NORMAL POPULATIONS 

By John E. Walsh 
Princelon University 

1 . Introduction. Consider two normal populations with the same variance 
and means n and v respectively. It is well known that confidence intervals and 
significance tests can be obtained for the difference y, — v. Since y is the 50% 
point of the first population and v is the 50% point of the second population, 
this lepresents a particular solution of the general problem of obtaining confi¬ 
dence intervals and significance tests for the difference 8^ — ¥> 3 , where 9a is 
the a percent point of the first population and 1^3 is the p percent point of the 
second population. The purpose of this note is to point out that the results of 
Johnson and Welch [ 1 ] for the non-central i-distribution can be used to furnish 
a solution of the general problem. 

2. Analysis. Let Ay be the 7 percent point of the normal population with 
zero mean and unit variance (1 e. exactly 7 % of the population has values less 
than A 7 ). Then if a is the common standard deviation, 

8a ~ y A- Aalfy (pfi = V Apa 

Thus 

Ob - >^3 = (p — r) -h (A„ — Ap)(r. 

The non-central i-distribution investigated by Johnson and Welch in [1] is 
based on the quantity 

t = (8 + 5)/VWf> 

where z has a normal distribution with zero mean and unit variance, 5 is a con¬ 
stant, and has a x'^-distribution with / degrees of freedom and is distributed 
independently of 2 . hlethods and tables are given in [ 1 ] whereby a constant 
tif, 5, e) can be computed having the property that 

Pr[t > tif, S, e)] = e. 

These relations will be used to obtain confidence intervals for 0a — <ps ■ The 
resulting confidence intervals can be used to obtain significance tests for 8a — ipp • 

Let ail, • • ■ , a;„ be a random sample of size n from the first population while 
Vi, , J/m is a random sample of size m from the second population. Then 
consider 




94 


JOHN E. WALSH 


This quantity has a non-central ^-distribution with 


> - (A, - Aj/yl + i, /-».+ 


For notational simplicity let 

/m 4- n - 2, ■ ==, A = lit), Z Oct - 5)’ - fSi, Z (Vi - S)* - iSj. 

V j 

Then one-sided confidence intervals for da — </>/> wth confidence coefficient e 
are given by 


- <Pfi < Si - y — 


l(e)VS? + Si 


4/(m + »-2)/(i + i)' 

da - (p^> & - y ~ t(l - e)V'jSf-f ;S| 

/(» + „- 2)/(i+iJ 

Two-Sided confidence intervals for da - (p) with confidence coefficient 

1 — («x H- 6a) 

are given by 

X — y — _ _-|- iSa 


/(« + »- 2)/(l+i) 

< da — <p^ <% — y ~ t(l — 6i)V*)Si 4-iSji 

|/(m + n- 2)/(i4-i) 


where 6i -f ej < 1. 


REFEBENOB 

[1] N. L. Johnson and B. L Welch. “Applications of the non-i 
Biomirika, Vol. 31 (1940), pp. 362-389. 


non-oenttal i-distributlon'l, 



THE TEACHING OF STATISTICS 

A report of the Institute of Mathematical Statistics Committee on the 
Teaching of Statistics^ 

PREFATORY NOTE 

This report on the teaching of statistics contains two parts. Part I is a sum¬ 
mary of the conclusions reached by the committee concerning the appropriate 
content and organization of teaching in statistics. It is oriented towards the 
future, and is intended as a program for action. Part II, mainly the work of the 
chairman of the committee, is a more intensive discussion of the general problem. 
It surveys the present .state of the teaching of statistics, probes some of the 
reasons for existing weaknesses in this teaching, and states more fully the basis 
for the conclusions summarized in Part 1. 

Additional material, with special reference to applied statistics, is contained 
in a report of The Committee on Applied Mathematical Statistics of the National 
Research Council, entitled Personnel and Training Problems Created by the 
Recent Growth of A-pplied Statistics in the United States? 

PART I 

SUMMARY OF CONCLUSIONS 

1 . Who are the prospective students of statistics? A complete teaching pro¬ 
gram in statistics must be designed to meet the needs of four principal categories 
of students, listed here according to the amount of training in statistics that is 
needed to meet their requirements, 

a. All college students. Statistical method is a vital branch of scientific 
method. It is widely used in most sciences, business, government, and ordmary 
life. Some understanding of the nature of inductive inference from quantitative 
data on the basis of the theory of probability as portrayed in statistical method 
is an indispensable part of a liberal education. 

b. Future consumers of slaiisiics. Some students will specialize in adminis¬ 
tration, business, or other subject-matter that will require them to understand 
the results of statistical analyses of special problems, although they themselves 
do not moke these analyses. For example, business executives and government 
administrators must frequently base action on statistical studies. Research 
workers and teachers in many fields may not themselves use statistical methods, 
yet in order to keep abreast of their own or cognate fields they must read and 
understand studies using statistical methods. 

c. Future users of statistical methods. A still smaller group of students of 

^ The Committee oonsietB of Harold Hotelling, Chairman; Walter Bartky, W. Edwarde 
Deming, Milton Friedman, and Paul Hoel, 

’ Copies may be obtained from the National Research Council, 2101 Constitution Ave., 
Washington 26. 


95 



96 


THE TEACHING OP STATISTICS 


statistics are training themselves lor careers of specialization in economics, pop¬ 
ulation, sociology, housing, business, businass rasoarch, industrial draign, indus¬ 
trial production, personnel, purchasing, public opinion, biology, iigricultural 
science, metallurgy, physics, cheimsti-y, psychology, or Home other field that 
makes extensive use of statistics. Research in these fields often requires the 
use of advanced statistical techniques, and even the dei'elopinent of new vStatiali- 
cal theory. Students planning to do such research need statistical tliwry and 
methods as a tool, 

d. Future producers and teachers of sUtiislical methods. The smallest, Imt in 
many respects most crucial group of students of statistics, are thiwe who intend 
to specialize in statistical methods for the sake of .«tntislical methodology. 
Many of these will become teachem or full-time rescarcli workers, though some 
wiU find posts in government and industry in high-grade statistical work, fre¬ 
quently requiring the development of new .statislieal tlieoiy and methods. 
These students will become tool-makers. 


2. What should they be taught? 

a. All college students.^ The fundamental logic and philosophy of stattolics 
can be taught at an early stage. It is perhaps an appropriate subject to include 
in the kind of survey courses of physical or social sciences that have Ivecome so 
common in recent years, Three or four weeks of lectures and diKcussions sliould 
suffice to acquaint the students with the broad principle.^ of inductive inference. 
No mathematics need be included, although some elementary ex])oriments may 
well be performed to instil the concepts of sampling variation, randomness, and 
statistical predictability. The student even at tliis stage can Iki made to recog¬ 
nize the fundamentally statistical character of most decisioim, an.sing from the 
fact that they involve an clement of uncertainty and a balancing of the impor¬ 
tance of different types of errors. The student can be made to undoretand the 
fundamental difference between inductive and dcduclive statoraents, llie nature 
ot statistical estimation, and the nature of a statistical li 3 'potho.sis. These 
concepts can be made concrete by illustrating them in terms of problems ranging 
froin everyday questions such as whether to cross a street in the middle of the 
block on up to such vital problems as the construction of an appropriate social 
security plan, or the design of an efficient experiment for selecting the best variety 
of corn, or the selection of the best method of testing for Iho presence of a disease. 

kind, of statistics need two 

kmds of training m statistics. First, they need somo knowledge of the kind of 

S ; n ? li'^itations. To meet this need they require what mav 

fieldVsnecklizat'^'^^ statistics," which places special emphasis on their owm 
^ ” ooo-oemosto ooor .0 k ,„mo depart- 

__the social sciences, or biological sciences) should meet 


teaS "f'etXCapVoi^ by a omnmiUee on the 

Society in 1947 aa a kport to the CouLiM^r*™ Published by the 

Statistical Society, vol. ox, Part I, 1947 . ’ Published in the Journal of the Royal 



THE TEA.CHING OF STATISTICS 


97 


this need. In addition, they need a leasonably thorough understanding of what 
statistics can and cannot do, what the major statistical techniques are, and how 
to interpret the results obtained by the application of such techniques This 
need may be met for those students who have some mathematical background 
by all or part of the fundamental one-ycar course discussed in the next section. 
For students lacking this background, special courses along similar lines will 
be required. 

c. Future users oj slalislical methods. It is essential for fruitful application 
that users of statistical methods should not mechanically apply procedures 
learned by rote or taken from a manual. Since few research problems fit per¬ 
fectly into clearly defined patterns, nothing is so important to the successful 
collection and analysis of statistical data as adaptability and flexibility in using 
techniques. These require a thorough comprehension of the logical foundations 
of statistics, especially of the assumptions underlying its various technical 
devices, and sufficient knowledge of the derivations of these devices to be able to 
adapt them to the special circumstances that inevitably develop. To provide 
this background, a minimum of a full year fundamental coume m statistical 
methods is essential, followed by courses of application. It is highly desirable 
that this fundamental couiso be based on calculus us a prerequisite, because with¬ 
out it a proper undeistanding of the development of statistical techniques cannot 
be attained. But this is probably impossible at present, in view of the unfor¬ 
tunately low level of mathematical training of most college students. As an 
expedient, and it is hoped a temporary expedient, it is recommended that the 
fundamental course be given in two sections, one requiring calculus, the other 
only a knowledge of first-year college algebra. A single course (or pair of coumes, 
in line with the temporary expedient just mentioned) should sujfxc for all depart¬ 
ments, because the core of statistical methods is common to all fields of study. 
Given in this way, the fundamental course can have the advantage of being 
taught by the most competent statisticians in the institution. 

In addition to a thorough training in theory and methods, users of statistical 
methods need training in applications. This can be provided by coumes in 
various applied fields. It is lusually advisable that these courses be given in the 
department of application (agriculture, population, engmeering, economics, 
psychology, etc,), and require the fundamental one-year course as a prerequisite. 

d. Future research workers and teachers of statistical method. The futuie 
research workers and teachers of statistical method clearly require far more 
intensive training in theory than has so far been suggested. A fundamental 
prerequisite to such training is knowledge of some advanced mathematics. It 
is difficult to specify exactly what or how much mathematics is necessary, but 
something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist. 

In addition to advanced mathematics and advanced work m statistical method, 
the future statistical theorist needs a good deal of work on applications, in the 
form either of experience or courses. He will be a tool-maker, and needs to 



98 


THE TEACHINS OF STATISTICS 


know by personal experience something of the problems of those who use his 
tools. One satisfactory arrangement is an internship in statistical rc«carch, 
as is currently provided by some institutions. By this arrangement, interna 
work under competent leadership in various government or private agencies 
that are engaged in large-scale statistical studies. Tlie intemB do roaearch in 
theory, adapt the physical circumstancea to theory and vice versa, and have 
actual practice in the design of experiments, the construction of questionnaires, 
writing of instructions, plannbg tabulations, analyzing the results, and exam¬ 
ining sampling variances. 

It is obvious that proper advanced courses in statistics will for many yours be 
the province of a few institutions only, as there does not exist at present an ade¬ 
quate professional body to man more than a few. 


3. Who should teach statistics? It is clear from the preceding section that 
two different kinds of courses are required to meet the needs of students of 
statistics; first, courses m statistical method and mcthmlology; and second, 
courses in applications of statistical methods to particular fields. 

The most important requirement for a successful xinivcmity program in stalia- 
tics is that courses in statistical method and methodology should be taught by a 
statistical theorist, a inan who has had the training outlined in Art. 2d above, 
IS specializing in statistics, is doing research in statistical method, and who has 
had some first-hand acquaintance with applications of statistical techniques. 
This IS the only way such courses can be kept abreast of dovdopmentfl and 
sufficiently broad to meet the needs of all departments. This rccoitimendution 
may seem to belabor the obvious, but a glance at the qualifioationa of moat 
people currently teaching statistical metliods wUl show why it is nocossaiy. 

Most courses in applications should be taught by people thoroughly conver- 
isant with the relevant subject-matter fields as well as statistical methodology. 
Some courses in applications may be taught by statistical theorists, particularly 
new applications or applications that are common to many fields. 

4 How should be teaching of statistics be organized? The teaching progmm 
in statistics should be organized around a separate administrative unit, an Insti- 
tute or Department of Statistics. This department should be primarily respon- 
sible for the teachmg of courses in statistical methods: the fundamental course 

Oi f f “'^thods for particular fields 

SI. ^ 

“Station' anpmmettto of U.0 

wS aootaZI f 'Tf oontaimiloatdon 

lo, Itself and othot d.partraente,- Md might undertako diiooUy, o. 


* See the interesting suggestions 
ifims, loc. cit. 


on this point on p. U In Personnel ond Training Prob- 



THE teaching OE STATISTICS 


99 


through an associated research staff, special assignments involving the applica¬ 
tion of statistical methods to concrete problems. 

Intermediate courses dealing primarily with applications ordinarily belong in 
other departments (agriculture, economics, demography, engineering, biology, 
etc.), although some may be given in the department of statistics. The exact 
location of courses in application will depend on the accident of the depart¬ 
mental affiliation of the persons competent to teach them. Coordination of the 
teaching program in statistics can be achieved by an interdepartmental com¬ 
mittee. The department of statistics should not, however, consist of such a 
committee under a different name. It should be a thoroughly independent de¬ 
partment, with all or most of its members entirely in the department. 

The recommendation that the responsibility for teaching statistical methods be 
centered in a separate department is based on the belief that the teaehing of 
statistical methods without theory can only be uninspiring and harmful; that a 
separate department of statistics offers the only arrangement that can assure 
statistical theory being taught by competent theorists, and the only satisfactory 
arrangement for ensuring the strong incentive for statistical research, with appro¬ 
priate recognition and advancement, which is as necessary for the teaching of 
statistics as for the teaching of any other subject. 

6. What should be done about adult education? The preceding recommenda¬ 
tions arc all directed toward the teaching of statistics to undergraduate and 
graduate students. There is an additional need that these do not meet, namely, 
the provision of training to mature research workers in various fields already 
established in their professions. This need arises in part from the inadequate 
teaching of statistics in the past, but oven more from the extremely rapid advance 
in the theoiy and practice of statistics which have made it difficult for any but 
the .specialist to keep abreast of developments. Some institutions are making 
efforts to meet this need by providing evening and late-afternoon classes for 
employed research workei-s. Such classes are feasible only in the larger centre.s 
of statistical activity. Tliere is also the need of providing advanced research 
workers in particular fields with highly specialized guidance in selected topics, 
A department of statistics organized along the lines suggested above can con¬ 
tribute toward meeting this need by effective counseling of colleagues in other 
departments, and by organizing special seminars and lectures for them. The 
profe.ssional statistical associations are also contrilmting by arranging special 
expository programs. 



100 


THE TEACHINQ OE STATISTICS 


PAET II 

THE PLACE OP STATISTICS IK THE I’KIVERSITY* 


Conknis 


A, Minor nuisances and inefficiencies in statistical teaching 

6. Lack of 0001 dination among departments. Lack of advanced courses and labora¬ 
tory facilities 

7. Inefficient decentralization of libraries 

B. The major evil: failure to recognize the statielical method as a aciaace, requiring spe¬ 
cialists to teach it 

8. Too many teachers not specialists 

9. Results, students ill equipped 

10 Reasons why teachers of statistics are often not specialists 

a The rapid growth of the subject 

b. Confusion between the statistical method and applied statistics 
c Failure to recognize the need for continuing research 
d. The system of making appointments to teach statistics within particular 
departments that are devoted primarily to other subjects 

11 Appointments under the existing system are not all bad 

12 TJnsatistaotory texts 

13. Omission of probability theory from texts and teaching 
C Proper qualifications of teachers of statistics 
14 Statistics compared with other subjects 

16. Current research in the statistical method is essential for teachers 

16. Minimum requirements m mathematics for the training of teachers aud research 
men in statistical theory 

D. Need for relating theory with applied statistics 

17. An example of the interaction between theory and practice 

18 Supplying opportunities for application in graduate studios of statistics 

E. Recommendations on the organization of statistical teaching and research In institu¬ 
tions of highei learning 

19 Research should be encouraged, teaching sohedulca should not be ovorloadod 
20. Organization of statistical service in the university 

21 Organization for teaching 

22 The statistical curriculum 

23. Statistical method as part of a liberal education 


A MINOE NUISANCES AND INEFFICIENCIES IN STATISTICAL TEACHING 

6. Lack of coordination among departments. Lack of advanced courses and 
aboratory facilities. The teaching of statistics in American colleges and uni- 
A^ersities, winch has for the most part been a development since the first world 
war and has now reached largo proportions, presents a number of un,satisfactory 
features Courses in statistical methods arc taught in various dcpartmeiH.s 
without coordination or mter-communication. These courses cover what is to 

snhStef! variations in the selection of 

subjects according to the ideas and abilities of individual instructors, and with 


version of thie part, prepared entirely by the ohairman, la being published 



THE TEACHING OP STATISTICS 


101 


illustrative examples drawn in each case from material pertaining to the depart¬ 
ment in which the coni-se is taught. Thus a student desiring to learn more about 
statistics than he can obtain in one department must, in taking courses m other 
di'partraents, repeat a gieat deal of what he has previously covered. 

There is a plethora of elementary courses and a dearth of advanced ones. 
Some departments have excellent statistical laboratories which they reserve for 
the use of their own students, each with an attendant to keep others away, while 
other departments have none. Some classes in elementary statistics are too 
large and some too small, with no one in a position to equalize the sections be¬ 
tween different departments 

7. Inefficient decentralization of libraries. The library situation is confused. 
Books on statistical methods are catalogued and shelved under Sociology, 
Economics, Business, Psychology, Zoology, Botany, Engineering, and Medicine. 
Books on probabilitif arc divided between Philosophy, Mathematics, Physics, 
and Chemistry. Books on the method of least squares are for the most part 
divided between Mathematics, Astronomy, and Civil Engineering, though some 
get into the Economics, Geology, and Physics reading-rooms. Works on the 
analysis of variance and design of experiments are likely to be concentrated under 
Agriculture, while methods of approximate evaluation of multiple integrals and 
similar purely mathematical subjects of use in statistics are, at least in one of our 
largest univei’sities, to be found only in the library of Biology 

B. THE MAJOR evil: FAILURE TO RECOGNIZE STATISTICAL METHOD AS A 
SCIENCE, REClXmUNG SPECIALISTS TO TEACH IT 

8. Too many teachers not specialists. The above nuisances are but minor. 
The major evil is that tliose attempting to teach statistical method' are all too 
often not specialists in the subject. Their original selection was seldom on the 
basis of scholarship in this field; they are not encouraged to make advanced 
studies in it; and their environment is such as to draw their attention in every 
direction except to the central truths and problems of their science. Frequently 
they lack the knowledge of mathematics necessary to begin to read the more 
serious literature of the subject that they are teaching. Many have been utterly 
unable to keep up ivith the rapid progress which has been taking place in statisti¬ 
cal methods and theory, progress which affects even the most elementary things to 
he taught. 

9. Results: students ill equipped. There results a widespread teaching of 
wrong theories and inefficient methods. Students are sent to the government 
service and to industrial and commercial statistical positions equipped with the 
skill that results from careful drilling in methods that ought never to be used. 
Some of these same students are encouraged and assisted to become college and 
university teachers of statistics without ever making thorough-going studies of 
the fundamentals of the subject, or exhibiting any power of making original con¬ 
tributions to it, or studying any graduate mathematics, Through the method of 
selection of teachers in general use, and through textbooks written by individ¬ 
uals of this type, there is a perpetuation of obsolete ideas and unsound methods. 

All this does not mean that any considerable number of people teaching statis- 



102 


THE TEACHING OP STATISTICS 


tics are uiiwortliy or objectionable nieinbers of the academic community. Many, 
indeed, are of superior intellect, upright character, personal charm, and un¬ 
doubted teaching ability. Some are making creative contributions to other hu 1> 
jects. The only trouble is that they are teaching a subject in wliich they arc not 
specialists, and which progresses so fast that only specialists can keep up with it. 

10. Reasons why teachers of statistics are often not specialists. The cliief 
reasons for the extensive teaching of statistical method by people who are not 
specialists m it appear to be the following; 

a. The rapid growth of the subject and multiplication of its applications, creating 
a very large and very urgent demand for teaching it that could not be met im¬ 
mediately by the small existing number of scholars specializing in statistical 
method. This diflBculty is aggravated by the paucity of university facilities 
for training advanced scholars m the field, so that even now the available number 
of such scholars cannot be expanded with sufficient rapidity to meet the current 
need, As specialists have not been available in anything like sufficient numbers, 
statistical method has inevitably been taught largely by non-apecialiste. 

b. Confusion between siahslical method and applied staiisLics, Statistical 
method is a coherent, unified science. “Applied statistics’* may mean any of 
thousands of diverse things. Any particular study m applied statistics will 
ordinarily utilize some few of the results obtained by the science of statistical 
method, but will be largely concerned with matters peculiar to the particular 
application in view and others closely related to it. For example, studies of 
business cycles utilize statistical methods, good or bad, with a view to drawing 
inferences from existing data on prices, production, incomes, interest rates, bank 
reserves and the like. The main job of the applied statistician in this field is to 
study the sources and nature of the various series of observations, keeping in 
mind incidental events which may break the continuity of a series, and watching, 
with a background of economic theory and knowledge of the facts, for cxplana- 
tions He should also be well acquainted with statistical theory, since other¬ 
wise there is grave danger of wasting or misinterpreting the laboriously accumu¬ 
lated observations. Indeed, an organization studying business cycles, or solar 
cycles, Of rat psychology or cancer or practically anything else, would almost 
certainly benefit from participation by a specialist in statistical method. 

However, the chief attention in any such study will not be on statistical method 
but on features peculiar to its own scope. The specialist in statistical method 
will do well to participate occasionally in such a study, but if he does so too ex¬ 
tensively the needs of the application will so engross his attention that he cannot 
keep up witli the progre.s-. of statistical method itself. 

The call of applications i.s enticing, and has led many young scholars to forsake 
the cultivation of statistical thcor}^ The applications have benefited greatly 
by the process Moreover, problems brought back in this way from applica¬ 
tions have provided valuable inspiration in developing theory. The mistake 
lies m supposing that participation in applied'statistics is equivalent to speciahza- 
lion in statistical method and theory, and the consequent appointment to teach 
the latter of persons whose sole concern is with the former. 



THE TEACHING OP STATISTICS 


103 


c. Failure io recogmze the need for cmiUnmng research in the theory o£ statistics 
by those who teach it. There is an easy tendency to assume that all the requisite 
ideas and formulae can be found in some book, and that the duty of the teacher 
of statisticB is simply to transfer this established book-knowledge to the minds 
of the students and impart to them skill in applying it. Similar attitudes ap¬ 
plied to other subjects have in the past been a drag on progress, and have long 
been discarded in respectable universities. They still hang on, however, even 
in the best institutions with respect to statistics. The spectacular advances of 
the last three decades in statistics should make it clear to anyone who has followed 
them that statistical method is far from static, that the best techniques of present- 
day statistics may tomorrow be replaced by something better, and that un¬ 
solved problems regarding the theory and methods of statistics are sticking out 
in every direction. A vast amount of research, mostly of a highly mathematical 
character, is needed and is in prospect. Anyone who does not keep in active 
touch with this research will after a short time not be a suitable teacher of statis- 
tic.s. Unfortunately, too many people like to do their statistical work as they 
say their prayera—merely substitute in a formula found in a highly respected 
book written a long time ago. 

cl. The system of making appoinimenia to teach sialisitcs within particular depart¬ 
ments that are dcuoted primarily to other subjects. In effect, the teacher of statisti¬ 
cal method is too often selected by economists or .sociologists or engineers or 
psychologists or medical men because he is to teach in one of these departments. 
Thus the task of selection devolves upon people unacquainted with the subject, 
though realizing the need for it in connection with a very specific application. 
Under such condition.^ there is an inevitable tendency to emphasize the immed¬ 
iately practical and specific at the expense of the fundamental work of wider 
applicability and greater long-run importance. Confusion between a science and 
its applications is most pronounced with those who know little about it, and the 
distinction between statistical method and applied statistics is likely to be com¬ 
pletely lost when a sociologist or an engineer is confronted with the problem of 
finding someone to teach statistics. If he does make the distinction at all he is 
likely to choose in favor of applied statistics. 

Strangely, the actual teaching that ensues is bound to consist largely of sta¬ 
tistical theory, because the students will ordinarily not have bad statistical theory 
elsewhere, and they must have .some in order to apply it. What often happens is 
that a sociologist or an engineer who has made some study of statistics embarks 
on what he thinks will be a career of teaching the application of statistical method 
to sociological or engineering problems, only to discover that because of the 
ignorance of the students he is compelled to teach the fundamentals of statistics, 
an entirely different subject for which he lacks preparation, talent, and interest. 

All incident of this sort has been cited previously.' A prominent economist 
was asked to teach a course entitled “Price forecasting” in a leading university, 
and accepted. He found, however, that his lectures on this subject were over 

' Harold Hotelling, "The teaching of atatistios ” Annals of Malh Slat., vol. xi, 1940, 
pp 467-470. 



104 


the teaching of statistics 


the heads of the students because he was using statistical concepts unfamiliar to 
them. He therefore went back over the ground covered so as to explain these 
particular statistical concepts along with their application. But in explaining 
them he foimcl himself using other statistical concepts, which in turn called for 
explanation. At the end of the semester he found that he had not given the 
course in price forecasting which he had planned, and for which the large class had 
enrolled, but instead had taught a somewhat disordered course, in eleraentary 
statistics, a subject in which he did not feel particularly competent, and for which 
the students had not come. Wlien he was asked t o tench price forecasting a year 
later he proposed that a prerequisite of a course in slntisHcs be imposed, but, this 
proposal was rejected by the chainnan of the department, and the coui'sc was not 
repeated 

11. Appointments under the existing system are not all bad. More liy acci¬ 
dent than by design in the existing system, not all stati.stical appointments by 
departments of application are bad. Some professors in the.se departments make, 
conscientious excursions into statistical theory, are well advisefl by e.oinpelent 
specialists in statistics, and bring about the appointment of men of high quality 
well acquainted with statistical method and tlieory of the currently liesL sort. 
This may work out well if the man so appointed is an able and criorgelie scholar 
deeply devoted to his subject, if he is placed immediately in the highest pro¬ 
fessorial rank, and if he does not feel under obligation to devote liiraKelf too ex¬ 
clusively to the special interests of the department of which ho finds himself a 
member. He is then free to pui-sue his specialty, 1 o keep informed on tlui latest 
developments m statistical method and himself to add to the suhjcci', while at 
the same tune transmitting to students a well rounded and up-to-date selection 
of knowledge. It is in this way that some of the present leaders in statistics liavo 
developed It is a wiong procedure, however, to depend on accidenla of this 
kind. 


The system of departmental orgauimtion and of making appointments and 
recognizing proficiency in the teaching of statistics needs to be altered. The 
usual story is typified by the appointment of a promising young scholar in sta¬ 
tistical method to a junior'position in some department of application whore he 
is expected to work on problems and to teach statistical methods with a solo eye 
to the work of the specific department. He is then under pressure to concentrate 
on a particular kmd of applied statistics, for his advancement will depend, not 
on his statistical attainments at all, but on his study of the literature, termin- 
olo^, techniques a,nd theories of the application. His usual associates will be 
in the departoent in which he is teaching rather than others teaching statistics. 
iUe loss, a though not total, is great, because the opportunity to make the most 
0 the man s statistical ability is lost, and his ability as a.n economist, agricultural 
scientist, engineer, or something else that he is not particularly fitted for, is 


A stillless favorable circumstance, and unfortunately more common, is that in 
which the teacher of stetistics is not even selected for scholarship in the theory 
of statistics. Studies in some other field, with some slight dabbling in the appli- 



THE TEACHING OP STATISTICS 


105 


cation of statistical methods to it, plus a pleasing personality, have all too fre¬ 
quently been thought to comprise sufficient qualifications for teaching statistical 
methods and theory. 

12. Unsatisfactory texts. The uncritical character of the teaching is reflected 
in the long line of textbooks written by teachers who have not made any gen¬ 
uinely fundamental study of statistics, but pass on to students in a magisterial 
fashion what was passed on to them. Authority takes the place of derivations 
and ultimate sources It is no wonder that these textbooks, copied from each 
other, contain increasing accumulations of errors, or that long delays have inter¬ 
vened between the introduction of important new statistical methods and theories 
in the periodical literature and their appearance in the textbooks and courses 
put before students. 

The latest di,scoverics in the theory of statistics affect what should be taught 
in elementaiy courses, and no syllabus can be expected to survive more than a 
few years of research. The development of new statistical methods and ideas 
of overwhelming importance must be allowed to compete with material already 
well established as tine and useful. The new material is equally true and in 
some cases even more useful than matter usually incorporated in the best of 
current courses and textbooks. 

13. Omission of probability theory from texts and teaching. One of the im¬ 
portant weakne.ssea in much of the current teaching of statistics is a failure to 
make proper use of the theoiy of probability. Without probability theory, sta¬ 
tistical methods are of only minor value, for although they may put data into 
forms from which intuitive inferences are easy, such inferences are very likely to 
be incorrect. The objective weighing of the degree of confidence to be placed in 
inductive conclusions is necessary to avoid fallacies. Indeed, the whole founda¬ 
tion of descriptive statistical methods, of inductive inference, and of the design 
of experiments, rests upon probability theoiy. 

The relevance of probability to much statistical work was indeed questioned a 
quarter-century ago by a group of economists impressed by the lack of independ¬ 
ence between consecutive observations, and this attitude, in conjunction with 
an exaggerated and belated remnant of nineteenth-century empiricism, has had 
a certain influence, particularly on the statistical methods in use by economists. 
This View is now rapidly giving way to a tendency to use the powerful new sta¬ 
tistical methods discovered in the meantime. It is now perceived that efficient 
objective methods can be used over a much wider range of cases than was formerly 
supposed, because the independence assumed in their derivations refers not to 
observations but to residuals from the theoretical model used, Furthermore, 
research is under way, and has already achieved promising results, on the exten¬ 
sion of accurate methods to still more extensive classes of problems, 

C. PHOPER QUALIFICATIONS OF TEACHERS OP STATISTICS 

14. Statistics compared with other subjects. The qualifications appropriate 
for teachers of statistical method and theory are not essentially different in degree 
from those for teachers of other subjects in the same institutions; proficiency in 



106 


THE TEACHING OF STATISTICS 


statistical method aud theory u merely to be substituted for it in other subjects. 
This substitution is, however, vital. It must not be imagined that proficrency 
m some other subject in which statistical method.s are used incidentally i.s equiv¬ 
alent to proficiency in statistical method itself. The error of such a suppoaiUon, 
if carried over into another field, might lead to the appointment of a man as pro¬ 
fessor of chemisti-y on the ground that he could cook. 

The first requisite of the college or university professor of any subject is a pro¬ 
found and thorough knowledge of that subject. It is customary in the better 
institutions at least to restrict appointments to the rank of assistant professor to 
persons who have demonstrated scholarly qualifications by work equivalent to 
that leading to a Ph.D. degree, including an original contribution to tbe subject 
that the individual is to teach. Promotion to the higher ranks is conditioned 
upon a number of criteria, among ivhich published research is by far the most 
important in those institutions. 

16. Current research in statistical method Is essential for teachers. Reaearch 
is even more essential in the teacher of statistics than ijn teachers of most other 
subjects, because so much remains to be worked out that is of immediate impor¬ 
tance. Some college teachers do no research. This is usually regarded as de¬ 
plorable, The evil is, however, of quite different magnitude according to the 
nature of what is taught by such teachers. In a now subject in which sharp 
differences of opinion exist or have recently existed on fundamental questions, 
in which current discoveries have an important bearing, and in which there have 
not yet been the time and consensus necessary for the preparation of an adequate 
and virtually error-free textbook, teaching without research may have calamitous 
effects, The effective teacher must, of course, have teaching ability, but no 
skill in pedagogy, no lustre of personality, can atone for teaching errors instead 
of truth. Errors are very likely to be taught by those who do no research, and 
then the more skillful the pedagogic indoctrination, the greater the harm. 
Sound educational policy calls for devotion to research of a large fraction of the 
time and energies of the teaching staff in a subject like statistical theory. Stu¬ 
dents also are in particular need of encouragement to do original and critical 
work in relatively new areas of this kind. They must be taught to shun the 
use of formulae and methods given merely on authority without full and con¬ 
vincing reasons, and to insist on looking closely and critically at assertions. 

Rven in the teaching of elementary statistical methods for direct practical 
use by specific occupational groups, where it might be thought that the teaching 
would most predominate over the research element, the teacher must face diffi¬ 
cult questions whose answers call for research in statistical theory, Let us 
illustrate this by one example out of the many possible. In teaching the analysis 
of variance for use in agricultural experimentation, questions arising out of the 
possible non-normality of the underlying distributions must be dealt with in 
some way The formulae, even those in the best textbooks, are accurate only 
if the distribution is normal, and neither this fact nor the non-normality of manv 
distiibutions should be concealed from the students. Obidously .something more 



THE TEACHINO OP STATISTICg 


107 


needs to be said on the subject at this point. What the teacher can say depends 
on hovv deep he has gone into a whole series of perplexing questions, on some of 
which the vieAvs of scholars are not yet stabilized, and on which a tremendous 
amount of research is needed before the maximum practical value can be attained 
for a technique whose usefulness is already amazing. 

16. Minimum requirements in mathematics for the training of teachers 
and research men in statistical theory. Because research in the theory of statis¬ 
tics requires advanced mathematics, and is indeed largely mathematical in 
character, a mastery of a substantial amount of higher mathematics must be an 
essential part of the training of prospective teachers of statistics. To specify 
exactly what or how much mathematics is necessary would be a difficult task. 
Something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist, the in¬ 
ventor of new statistical methods On the other hand, the time of the graduate 
student in statistics is much occupied with the theory of statistics itself; and some 
of his time should also go into the study of applied statistics. If the students 
entering a graduate school for advanced work in statistics went there equipped 
with a knowledge of matrix algebra and theory of functions and some additional 
higher mathematics, as is obtainable by undergraduates at some institutions, 
they would have time for applied statistics and could do some real work on 
applications. 

There is a cruel dilemma here, resulting from tlie delay in learning mathematics 
imposed by the elementary curricula which have become customary in this coun¬ 
try. The weakness of the mathematical element in the prevailing curiioula 
affects both teachers and students of statistics to an extent justifying some atten¬ 
tion from those interested in the improvement of statistics. In American uni¬ 
versities elementary calculus is not often taught before the sophomore year, and 
the more advanced parts of algebra come still later, if at all. 

If calculus could be pushed down into the high schools and assumed as a pre¬ 
requisite for college courses in mathematics, statistics, economics, physics and 
several other subjects, the efficiency of instruction in all these departments could 
be increased. For example the difficulties experienced by students of economics 
with ideas of marginal cost, marginal revenue and the like correspond closely 
with the difficulties esqierienced by mathematicians for centuries in trying to 
define infinitesimals and derivatives, but now successfully overcome. The 
student who really knows differential calculus need not have the slightest diffi¬ 
culty with the marginal ideas of economics. Similarly in physics, the funda¬ 
mental concepts of speed, acceleration, potential theory, conductivity, thermal 
capacity and radiation, are all mathematical and easier to grasp once and for 
all as such than to be learned afresh with each new application from textbooks 
in physics sometimes not clearly written and taught by teachers who must for 
one reason or another avoid a mathematical approach. 

The possibilities of teaching quite advanced mathematics to young children 



108 


THE TEACHIKa OP STATISTICS 


have scarcely begun to bo explored. Chiklmn of kindiu'iu-t on iiRc art* faM'itialpil 
and thrilled by the ironder.s of topology, flroups anti iuiiiila*r tlioory can he* 
tremendous sensations in the fifth grade, though all tliew* Htibjei'lh are (trditiarily 
reserved for giaduate students specializing in mathematics. Whuf laekiiiK 
is teachers who know mathematics and il.s applications aiul who piwsees enough 
freedom to teach what they know instead of the long, dull and relatively useless 
drill on problems of wallpaper hanging and the like, prohlenie timiing on mere 
conventions which aie quickly forgotten, painful repelitioim work whie.li makes 
children resolve to quit mathematics as soon as possilile. 

D. NEED FOR EEL.ATING THEORY W'lTH .U'RUED STVTlhTirS 

17, An example of the interaction between theory and practice. A profeswir 
of psychology working with mental tests might enlist the assistaiiee of a young 
statistical theorist with mutual benefit. The young man might for ii shorl time 
do some of the drudgery of scoring tests and computing, passing on soon f o the 
problems of test construction and the di.stributiou of \'Hrit)U.s fiiiietions of etjr- 
relation coeiBcients. Tins last is on a new and e.xciling fjontier of statistiral 
theory The advancement of this frontier, which is really the main businesH of 
the young man in his capacity as prospective slalistical tlicoriHl, would in this 
way come to him naturally as a problem or series of problems having a tangible 
meaning additional to it.s mathematical content. The ompirical context i« in 
such oases often of great value in suggesting suitalile approaches, for example, 
suitable approximations m the study of functions not susceptible, to simplo 
mathematical representation in terms of elementary functions. 

If the young theorist succeeds in extending the boundaries of mullivariate 
statistical analysis by discovering the distribution of some new function of cor¬ 
relation coefficients, the chances are that this discovery will also luivo applica¬ 
tions in anthropology, medicine, banking, and other pursuits which in tim aggre¬ 
gate will greatly outweigh the application originally in view. 

The discovery should be regarded primarily as a coulributiou to the, general 
theory of statistics, and published in a journal devoted to mathematical stalisi ics. 
It will then become available to a wide circle of teachers of statistics, vv'Jio mav 
incorporate it into their courses, and its methods and results will be studied by 
other investigators from the standpoint of possible generalizations and analogs. 
The importance of the discovery would be much more limited if it were thougiit 
of as a development in psychology and published only in a psychological journal. 
Perhaps dual or multiple publication ought to be permitted in sucli casw, hut the 
first publication should be in a journal of mathematical statistics, Far too many 
good statistical ideas have been buried in connexion with obscure special applica- 

18. Supplying opportunities for applications in graduate studies of statistics. 

The .statistician who does any work in applications must know statistics as an 
art as well as a science The theoretical statistician, if he wishes to be of the 
utmost use to his colleagues in other disciplines, needs to know by pcreonal 



THE TEACHING OF STATISTICS 


109 


'(‘xperience somothing of Ihcir lives and collateral problems. Indeed, experience 
with applications, and the challenge of problems arising out of applications, have 
played a most important part in the development of statistical theory. It 
follow,s that the graduate student in slatistica needs contact with applied slatis- 
(ics which the institution should undertake to provide, or at least facilitate 
Tlii.s need is nc.xt in importaiice after the needs for theoretical statistics and for 
I)UiT mathematics. The distribution of time among the three—theoretical 
slatisticK, mathematics, and applied statistics—is hard to specify exactly, and 
must, in any case depend on the nature of the student’s previous work. If his 
mathematical preparation has been full and rich, moie time should be spent on 
applied statistics in his graduate yearn than if he has already had substantial 
contact with applied statistics in some other way but is deficient in higher 
nuithematiCvS. 

Applied statistics entails a somewhat detailed acquaintance with the field of 
application. Such a field might be life insurance, or mental testing, or industrial 
quality control, or sampling in the work of the Bureau of the Census or some 
•other government agency; it might be agricultural economics, or business cycles. 
Proficiency in any such field calls for rather prolonged study, and it would be too 
much to expect the embryo statistical theorist to reach this stage of advance¬ 
ment in all subjects. He should, however, make more than a superficial study 
of some chosen field of application. This study might or might not be at the 
university. The requisite familiarity with applied statistics might in some cases 
be acquired by work in a government bureau, or in a research organization study¬ 
ing business cycles or something else involving applied statistics. What is most 
desirable is that the work should have brought the student to the point both of 
applying stati.stical methods in a reasonably effective way, and of perceiving the 
limitations of existing statistical methods. Perception of existing limitations 
has frequently been the germ of progress in the subject. 

One sati.sfactory arrangement is an internship in statistical research, as is 
currently provided by some institutions. By this arrangement, interns work 
under competent leadership in various government or private agencies that are 
engaged in large-scale statistical studies. The mterns do research in theory, 
adapt the physical oircumsLanccs to theory and vice versa, and have actual 
practice in the design of experiments, construction of questionnaires, writing 
of instructions and tabulation plans, analysis of the results and appraisal of 
sampling variances. 

hi. nKCOMMENDATIONS ON THE OBGANIZATlON 01 STATISTICAL TEACHING 
and liESEAHCH IN INSTITUTIONS OF HICHEH LEARNING 

19. Research should be encouraged; teaching schedules should not be over¬ 
loaded. Colleges and universities usually expect the members of their faculties 
to engage in research as well as in teaching, the relative emphasis on these two 
functions varying greatly from institution to institution and to a lesser extent 
among departments within the same institution. Reasons why teachers of 



no 


THE TEACHING OF STATISTICS 


statistics must do current research in order to teach tiie subject have already- 
been given in Art. 15. In the organization of statistical teaching it i.s thu.s of 
extraordinary importance that colleges and imiversities emphasize research in 
the theory of statistics as a leading part of the work of the teaching staff in thi.s 
field. Hours of teaching and other duties must be kept within such bounds as 
to make research possible, the initial selection of teachers must be of peraons 
capable of research m statistics, and there must be provision of needed secrcLarial, 
computational and other assistance. The library must be adequate, not only in 
publications containing statistical theory, but in the lai-gcr field of pure mathemat¬ 
ics as well. 

20. Organizing statistical service in the university. In addition to the cus¬ 
tomary duties of teaching and research, faculty members expert in statisticaf 
methods find that they cannot escape a third, viz , advice to their colleagues and 
others regarding the statistical aspects of their problems. This often takes a 
good deal of time. Clearly it is in the interest of the academic enterprise that 
such services be provided. Scholam in many departments are finding that their 
work is greatly improved by competent statistical advice not only in the inter¬ 
pretation of their data but also in the design of their e-xperiments and other 
investigations. The provision of competent advice frequently requires extended 
consideration of the general content of the problem as well as special analysis of 
its statistical features. And initial advice often needs to be supplemented by 
further service. The statistician, like the phj'sician, often finds that one inter¬ 
view at which a prescription is dispensed does not end the matter satisfactorily. 

Teaching hours must be distinctly limited if statisticians are to be able to- 
render this service to the rest of the institution as well as maintain a high level 
of research in their own field. 

One way to handle the problem of statistical service, especially in a lai'ge 
institution, is through a special organization devoted to this purpose. Such an 
organization, whether called a Statistical Institute, a Department of Applied 
Statistics, Statistical Laboratory, or something else, might supply not only 

advice but a more active kind of assistance, including computational and chart- 
drawing services. 


A statistical service organization should be removed from the teaching of statis- 
bes only to the extent necessary to gain the advantages of some degree of special¬ 
ization and to prevent undue interruption of the teacher’s other work of teaching 
and of research m theory. There aie distinct advantages for all parties in a 
fairly close connexion between practical statistical work, research in statistical 
theo^, and statistical teaching. Each of these activities benefits the Others, 
provided only that it does not take away from it too much time. Eesearch in 
sta isticaltheoiy, like medical research, needs frequent revitalizing injections of 

Se teaEa of of contact with students, 

^ vigorous both by research in 

taed E the n ^ applications with which students can be con¬ 

fronted. .\nd the needs of applications are better met if through an organiza- 



THE TEACHING OF STATISHCB 


111 


tioii such as is here envisaged they can be brought to the attention, of appropriate 
specialists, and if also students can be enlisted when needed for their treatment. 

A university organization dealing with statistics may properly comprise two 
parts with overlapping personnel, one devoted chiefly to applied statistics, the 
otlier to theoretical statistics. The teaching might be done by both, but at 
least at the more advanced levels would be primarily the concern of the theoreti¬ 
cal part. Migration between the two ought to be easy and frequent, though some 
individuals are so definitely adapted to one kind of work or the other as to make 
it undesirable to have fixed iule.s calling for periodic transfers. 

In smaller institutions it may not be practicable to have statistical organiza¬ 
tions sufficiently well staffed to provide adequate consulting service. To meet 
the needs in some of these cases regional centres for advice and service in applied 
statistics might be established at large univemitics throughout the country, 
with access made readily available for sister institutions. These centres might 
also carry on work in applied statistics in behalf of government agencies and other 
organizations, much as various agricultural colleges have for years been carrying 
on cooperative work with the federal Department of Agriculture. 

The question how far, if at all, such a university centre of applied statistics 
sliould go into the market place and engage commercially in service to business 
concerns is a debatable one. While there may be favorable reactions upon 
acientific work, there are grave dangers to the intellectual integrity of the in¬ 
stitution which need serious consideration. 

21. Organization for teaching. Passing from questions of personnel and the 
research and service functions of academic statisticians to teaching itself, we have 
to consider problems of departmental oi^anization, of course contents, of systems 
of prerequisites, and of methods of teaching. All these we consider secondary 
problems, not in the sense of being unimportant, but because we believe that 
proper solutions of them will be reached with reasonable promptness when 
personnel of the kind described in Sec. C of this report are at work in some such 
general setting as has just been described. The ideas recorded below are general 
in character and are to be regarded as a starting-point for developing a program 
in a particular institution, once suitable faculty members have been obtained. 

The teaching of statistics may be organized in any of the following ways: 

ti. In a department of theory and a department of applied statistics, both 
forming an Institute of Statistics. 

b. In a single Department of .Statistics. 

c. Under on intcr-departmcntal committee, 

d. Under the exclusive jurisdiction of the Department of Mathematics, 

e. It may be scattered among heterogeneous departments of application, 
without formal coordination. 

Only a few large institutions will be in position to adopt the first plan. It is 
likely that the second will be most suitable for the majority. The third should 
probably be regarded as a makeshift for the transitional period until a proper 
department of statistics can be organized, a step that will not at the moment be 



112 


THE TEACHING OF STATISTICS 


reasonably possible for most institutions because the right kind of scholarly 
personnel does not exist in adequate numbers. It is of course possible that some 
vestige of an inter departmental committee, perhaps in the form of an Advisory 
Board, might be a useful adjunct of a depaitment of statistics in order to keep it, 
informed of the needs of applications. It is also possible that something of the 
sort might function with respect to a department of mathematics, or any other 
department On the other hand, the desired consultations and adjustments 
might be accomplished in less formal ways. 

To make statistics a subdivision of a mathematics department is a solution 
that will appeal to administrators desirous of keeping down the number of de¬ 
partments The subject-matter of statistics is to a sufficient extent mathemati¬ 


cal to give some apparent weight to this plan, and some mathematicians have the 
unsound idea that any mathematician can teach statistics without specialized 
study or experience in application On the other hand, statistics has some 
features uncongenial to traditional mathematics, arising partly from the uiTnency 
of practical needs which go beyond what can immediately be provided by rigor¬ 
ous mathematical theory. Again we may cite the problem in the teaching of the 
analysis of variance of what to do about possible non-normality of the underlying 
distribution (Art. 15). The user of this technique has the responsibility of 
verifying that the situation conforms to the as.sumptioiis, including that of nor¬ 
mality, underlying the tabulated probability criteria, But he is in a I'cry jioor 
position to do this in a large proportion of the applications actually mado of the 
analysis of variance Yet the analysis of variance in some form—possibly 
through the use of rank-order numbers or through a transformation or some oilier 
auxiliary device-remains the one powerful means of attacking a very large and 
important class of practical situations. The practicing statistician needs to do 
some highly educated guessing on such mattem—guessing that will bo assisted 
but not made determinate by knowledge of a considerable range of mathematical 
truths regarding approaches to the normal distribution, moments of the variance- 
ratio m samples from non-normal populations, asymptotic large-sample theory, 
and other such topics. His mathematical insight needs to be supplemented by 
consideration of the particular subject-matter of application. Moreover, it is 
desirable that students of statistics have some practice ivith actual empirical 
data designed to develop the art of guessing in such ways. 

Another example of non-rigorous mathematics used extensively in statistics is 

I t errors found by the differential 

t od. It IS desirable that good mathematics replace bad in such connexions, 

Poetical s(ati.s. 

icians have been driven, that even bad mathematics may be better than none 
, good mathematics along these lines can come only through 
hteiestel nuT ^ serious studies of statistics, though a sufBcienUy 

rtSJcT t^nrW eventually be led by such a student of 

- rl Cl take and complete the necessary research. Practical needs 

make appicximaimn. necessary; the goodness of a particular approximation can 



THE TEACHING OP STATISTICS 


113 


often be judged adequately by a statistician familiar with the particular applica¬ 
tion long before the heavy artillery of advanced mathematical analysis can bo 
brought up. 

Ihe teacher of statistics must have a genuine sympathy and understanding for 
applications, and these are not possessed by many pure mathematicians, at least 
in the opinion of some of those concerned with the applications; and it is tliis 
opinion rather than the possible fact that is of interest at the moment. For so 
long as such an opinion is maintained, for example by psychologists and econ¬ 
omists, these specialists will be suspicious that courses in statistics given by a 
department consisting largely of pure mathematicians is unsuitable for their 
purposes. The result is likely to be a sabotaging of attempts at centralization, 
the different departments reverting to the old and ultimately objectionable 
sj'stem of teaching their own separate courses in statistical methods. ' 

These difficulties are not necessarily insuperable, and it is to be expeefed that 
many medium-sized and small institutions will make their mathematical depart¬ 
ments responsible for statistical teaching. But this ought not be be done without 
a consideration of the possible dangers. 

22. The statistical curriculum. We next consider curricular problems These 
maybe divided into those of the graduate school and those of the undergraduate 
college. Those of the graduate school may in turn be divided into those of 
specialization in statistics and of auxiliaiy teaching of statistics to students in 
other departments, such as sociology, who need to use statistical methods, have 
not studied them .sufficiently as undei^radifates, and cannot afford to put much 
time on them. Of these two subdivisions the number of students at present is 
greater in the second and the ultimate importance is greater in the first, because 
the whole future of statistics depends on improvement and enlargement of this 
graduate teaching. 

The incidental teaching of elementary statistical methods to graduate students 
in other subjects, without any prerequisite in mathematics or statistics, cannot 
equip Ihcse students with a command of the subject at all comparable to that 
which could be obtained by a better integration of nndergraduate with graduate 
work. A prospective sociologist, economist, psychologist, or physicist ought to 
study elementary statistical methods and concepts while still an undergraduate, 
and without special reference to his ultimate field of specialization. 

The features of stati.stical methods peculiar in their applications, beyond what 
is taught through illustrations and exercises in an elementary course, may bo 
fit material for a coumc, graduate or undergraduate, in a department of the 
apiilication. Such a course should require as a prerequisite an elementary course 
in a depart ment of statistics, or at least one taught by specialist-s in statistical 
method and tbeoiy. 

For the unrloj*graduatc college, in place of the sporadic offerings now cuiTcnt in 
different clepartment.Si we recommend a combination of two general fundamental 
courses with a number of advanced courses. Of the latter some will be special¬ 
ized to the work of particular departments or groups of departments. 



114 


the teaching of statistics 


Of the two fundamental courses one will require calculus as a prerequisite, the 
other only a knowledge of first-year algebra. It is to bo hoped that the loss 
mathematical of those two general statistical courses, instead of being elected hy 
a majority of students, will gradually approach e\tinclion, while the course baaed 
on calculus will become the vital point of contact of the student body with the 
concepts of statistics. The chief reason for insisting upon the importance of 
calculus as a prerequisite is simply the possibility of covering important sUtLstical 
theory that is inaccessible to those who do not liave it. 

Modern statistical methods are based on the theory of probability. The 
general courses in statistics may therefore well begin with elementary probability. 
The duality between probability and statistical concepts,^ for example between 
probability and relative frequency, between mathematical expectation and a 
sample mean, between parameter and statistic, should be explained. Deriva¬ 
tions and the place of the normal distribution should be sketched, and the Htiident 
distribution should be derived and applied to a variety of problems in the firel 
course based on calculus. Later courses given by the department of sLiVlisties, 
or whoever specializes in statistical theory, will naturally cover other statistical 
methods and theories. At the same time useful courses oau lie oflforod in oco- 
nomio statistics, mental testing, and other fields using .statistical methods by 
specialists, regardless of departmental affiliation. There might be departmental 
cooperation; for example, the department of statistics might olTcr elementary and 
advanced courses m correlation and multmariate analysis, and the department of 
psychology might require these as prerequisites for some of its work in mental 
testing. ' 

The’teaching of statistics should be accompanied by considerable work in 
applied statistical problems, as well as exercises in mathematical theory, on the 
part of the students. A large part of this work in applied statistics is best con- 
duofed'iu a laboratory equipped with calculating machines, mathematical tables, 
drafting instruments, and other appurtenances. 

Statistical laboratories require supervision, administration and maintenanco. 
They are needed not only for the purpose of teaching statistics, pure and applied, 
at all levels, but also by research workers in many fields. There are possible 
gains of efficiency and economy in a centralized administration of them. One 
suggestion is that they be under the supervision of the university library. 
Another is that responsibility for them be lodged in a central department of 
statistics, or in a two-department statistical institute. Centralization can bo 
carried too far, and it is likely that some units in a large organization will find 
it advantageous to have machines which are exclusively their own. The con¬ 
flicting claims regarding machines and laboratories will require careful weighing. 

23. Statistical method as a part of a liberal education. A question may also 
be raised as to whether some work in the statistical method should not bo re- 
quired of all colleg e students as a part of a liberal education. This would be 

' Cf, the article ‘Trequency distribution," Encycl. of the Social Sciences (1931). 



THE TEACHING OF STATISTICS 


115. 


a novel step, but has much to be .said for it in view of the widespread use of 
statistics and growing interest in statistics! Another point is that the student 
who can’t make up his mind as to his ultimate field of specialization or vocation 
will do well to study those things that can be used in many fields. Of such things, 
mathematics and statistics are leading examples. There are more or less sound 
objections to systems of required studies, but if we are to have them, the claim, 
of statistics should not be rejected merely on grounds of novelty. 



ABSTRACTS OF PAPERS 

Presented December 22,1947 at the Berkeley Meeting of the Institute 

1, The Performance Characteristic of Certain Methods for Obtaining Confidence 
Intervals. B. M Bbnnei'T and J. Neyman, University of C'alifornia, 
Berkeley. 

Certain methods for obtaining confidence limits have been introduced by Bliss, It. A. 
Fisber and Paulson. Thus, e.g., let , j/< (t = 1, ■ • •, w) represent a sample from a bivari¬ 
ate normal population with means £(*<) = (, E{yi) = of£ and variances and oovarianoe 
ol ,<rl, (r,u • If S) y> s’, Sj, are the sample means, variances and covariance respec¬ 
tively, then in order to determine confidence limits for a, the ratio: 

Vn{9 - aS) 

VSl-2aS,,+a*Sl 

may be referred to the appropriate value li of the Student-t distribution. The inequality: 

I li I < <, may, in general, be solved as a quadratic equation in a to yield two values a, a 
which are presumed to be confidence limits for a. In this paper the probability w of being 
correct in using such a procedure, i,e., the performance or operating oharaoteristio, is com¬ 
puted in the limiting case when , aj ,<r,u = pir,irt are assumed to be known. It is ahovrn that 
IT is a function 5r(a, £, »*, ovip) of all the parameters, and in particular of a itself, the quantity 
for which confidence limits are supposed to be provided. Similar "quadratic*' methods 
are also used in certain regression problems, e.g., in determining oonfidenco limits for a 
value of * corresponding to an additional value of y when a previous sample regression oty 
on X is available; or in determining confidence limits for the intersection point of two popu¬ 
lation regression lines. The performance chniacteiistlo of each of those methods is shown 
to be a function of the quantity for which the method gives ooufldenoe limits. 

2, Some Further Results on the Bernoulli Process, T. E. Hahris, Douglas 
Aircraft Co. 

Letsi ,zj ,z,, ••• ibeasequeneeofrandomvsTiableBdefinedasfollows: ?(zi“ r) «a p,, 
r = 0,1, 2, ■ ■ ,k. If z„ = 0, z„+i = Q. If = r, r 0, then Zn+t is distributed as the 
sum of r independent random variables, each having the same distribution as zi . It is 
assumed that j < 1, where i = E(zi). Let W be the smallest value of n such that a„+i «« 0. 
A method is given for obtaining an expansion of the moment-generating function of N. 
In the case where p, = 0 for r > 3, this expansion takes the form 1 H- (1 — e"‘) (1 — po) 
F{s), where P(s) = fi(s) - pj(l - pa)/,(8) = 2ip’,(l - p.)‘/i(») “ ■ ■ • . where /rfs) ® 
(e~‘ — »)"!, and = fn.i{s){e'‘ — s’*)"'. Certain restrictions on the constants pr 
insure that this expansion converges for a complex neighborhood of s >> 0. 

3, Most Powerful Tests of Composite Hypotheses I. Normal Distributions. 
E. L, Lehmann and C. M. Stein, tTniversity of California, Rerkolcy, 
California. 

Critical regions are determined for testing a composite hypothesis, which are most power¬ 
ful against a particular alternative among all critical regions whose probabilities under the 
hypothesis tested are bounded above by the level of signifloanoe. These problems have 
been considered by Neyman, Pearson and others, subject to the condition that the ori,tical 
region be similar. In testing the hypothesis specifying the value of the variance of a normal 

110 



ABSTIUCTS OF PAPERS 


117 


distribution with unknown moan against an alternative with larger variance, and in some 
other problems, the beat similar region is also most powerful in the sense of this paper. 
However, in the analogous problem when the variance under the alternative hypothesis 
is less than that under the hypothesis tested, in the case of Student’s hypothesis when the 
level of signifioanee is leas than i, and in some other eases, the beat similar region is not 
most powerful in the sense of this paper. There exist most powerful tests which are quite 
good against certain alternatives in some cases where no proper similar region exists. 
These results indicate that in some practical oases the standard test is not best if the class 
of alternatives is sufficiently restricted 

4. On the Selection of Forecasting Formulas. Paui; G. Hobl, University of 

California, Los Angeles, California. 

Given two competing formulas, u = p(ai, • • • , «„) and v = h(zi, • • , Zm). for forecast¬ 
ing a variable x, a significance test possessing optimum properties is designed for deciding 
whether one formula yields significantly better forecasts than the other The test, which 
turns out to be a Student t test, is constructed as a test of the hypothesis He ■mi = u, against 
the alternative Hi ; mi = v, , (t = 1, • , n), in which it is assumed that the variables 

* 1 , • • ■ , , corresponding to the n samples, are independently normally distributed with 

means mi and variances o-, = . 

5. On the Power Function of the “Best” f-test Solution of the Behrens-Fisher 

Problem. vT. E. Walsh, Douglas Aircraft Company 

The most powerful l-teat solution of the Behrens-Fisher problem (one-sided and sym' 
metrical) was obtained by Sohoffd in Annals of Maihemalical SialisUcs, Vol. 14 (1943), pp- 
35-44. This note derives (approximately) the power efficiency of this t-test for the case 
in which the ratio of the variances of the normal populations is also known. Let the 1-test 
be based on m sample values from the first normal population and n sample values from the 
second normal population, where m < n. For fixed values of m and n, a symmetrical 
(-teat with significance level 2a has the same power efficiency as a one-sided (-test with 
sigmfioanoe level a. For one-sided (-tests with sigmficance level a, the power efficiency 
is approximately 6o[iJ 4- 's/B’ — 8(w 4- n)A]/(??i -f n), whereB = 24- (w 4- n)A 4- X«/2, 
A <= 1 — Ka/2(m — 1), and Ka is the standardized normal deviate exceeded with probability 
a. This approximation is reasonably accurate for m ^ 4 if a = .06, w > 6 if « = .026, 
m ^ 6 if o = ,01, m > 7 if Q = .006 Intuitively the power efficiency of a test measures 
the percentage of available information per observation which is utilized by that test 

C, On Sequences of Experiments. Charles Stein, University of California, 

Berkeley, California. 

One performs a sequence Of N experiments to decide between two simple hypotheses 
regarding probability distributions of certain observable quantities. At each stage there 
is a choice among L oxporimenta and the one chosen yields a random variable. One wishes 
to achieve certain upper bounds « and (9 to the probabilities of first and second kind errors 
respectively, and, subjeot to theso rcatriotions, to minimize the expected oost under a third 
hypotliosie. The cost of each particular sequence of experiments is known. A solution 
is obtained, essentially by applying Lagrange’s method and working back from the end 
of the experiment. This can be generalized to multiple decision problems. The results 
are applied to two-sample tests with the second sample of variable size, and to Wald’s 
sequential analysis, As another problem, suppose (Zi, Fi), (Xj, Fj) • ■ • are independ¬ 
ently distributed with bivariate normal distributions having mean £ and covariance matrix 
S, both unknown. One tests Ho : f = 0 against Hi : = i A test (not necessarily 

optimum) valid within the usual approximation is obtained from the ratio of the p.d f. 



118 


ABS'CRACTS OP PAPERS 


of Hotelling’s T* under Hi to that under Ht. Analogous results hold for the multiple 
correlation coefficient, ratio of two variances and test for linear hypothesis. 

7. The Effect of Selection Above Definite Lower Limits of Linear Functiona of 
Normally Distributed Correlated Variables on the Means and Variances of 
Other Linear Functions. G, A. Baker, University of California, Davis, 
California. 

Sometimes certain variables in a system can be observed before other ooonomioally or 
socially important variables. These variables or linear combinations of them can bo used 
as a basis of selection at given levels. The question is: How does selection on these Cartier 
or more easily available variables affect the mean and varianoe of the eeoaomieally or so¬ 
cially more important variables or, perhaps, linear functions of the more important vari¬ 
ables. The general procedure is clear. We transform to a new system of variables which 
contains the linear functions on which selection is performed and the linear funotlons of 
which the means and variances are required as separate variables. The remaining new 
variables are eliminated, by integration. The final calculation involves the numerical 
evaluation of integrals whose integrands are the product of polynomials and normal multi¬ 
variate functiona and whose limits depend on the given levels of selections. The general 
ideas ate simple but the actual labor of computation in a given ease is tedious. An example 
IB considered m detail 


8. An Inversion Formula for the Distribution of a Ratio of Random Variables, 
J. Gurland, University of California, Berkeley, (lalif. 

The repeated Cauchy principal value of integrals applied to oharaoteristio functions is 
used in obtaining inversion formulae for distribution funotions. Let the random variables 
Xi and Xi have a joint distribution function with corresponding oharaoteristio function 

iHi, ti). guppose P{Xi < 0| = 0. Let j g(i) dl = Urn -f dt for any 


function o(J). If G(x) is the distribution function of Xi/X, then (?(*) + (7(» - 0) 

Tif i ^0Tm\i\a. is free of reetriotions which accompany the formula 

given by Cramer in the case where Xi and X, are independent; and difterontiation extends 
a result of Geary to a much larger class of distribution functions. Further generalisations 
of the theory are obtained, and as an example the distribution function of the ratio of quad¬ 
ratic forms of random variables Xi ,X, X,is oonsideredin the case where Xj ,Xt ••• Xi 
have a multivariate normal distribution. 


9 . Independence of Parameters and Sufficient Statistics. 
Utnversity of California, Berkeley, California. 


E. W. Baiunkin, 


andmtnfmolset 

ar,. -u,.Hl,ly ,K.r,r,ej lor a class of families of probability densities b(*,, ,ai, ; 

\ • ' of each of these sets ta determined as the rank of a certain 

Md'osrt differentiability is eventually required of the function »' 

tL lari designed to ensure that the behavior of p fn 

assumed ThTi!rnhl° ^ken only continuous differentiability is 

assumed. The problem of determimng the order of a minimal set of suffiolant atatlntln. 

IS made, by certain device, to become identical in charaeter with that of finding the order 



ABSTRACTS OP PAPERS 


119 


Au expHoit method ie given for finding a ooinplete set of independent parameters and a 
minimal set of sufficient statistics. 


(Presented December 30, 1947 at New York at the Annual Meeting of the Institute) 

1 . Distribution of the Circular Serial Correlation Coefficient for Residuals from 
a Fitted Fourier Series {Preliniimry Report). R L. Andebson, University 
of North Carolina, Raleigh, North Carolina and T. W. Anderson, Columbia 
University. 

Given a set of N observations {A<{, which are defined as follows: 

Xi — Hi = p- (Xi — L — Hi — L) + fi, 

where the residuals lt,l are assumed to he normally and independently distributed with 
sero means and equal variances and L is the lag. A statistic for testing the null hypoth¬ 
esis: p >= 0 is iR, the oiroular serial correlation coefficient of residuals et from a regression 
line fitted by least squares; X, <= Mi + a. The following regression line is considered: 

.¥,-ao4-i;^a*Cos^+£' h&r.—, 

where k ranges over some subset of the integers 1, 2, ••• , UN — 1) or i(N), depending 
on whether N is odd or even (if N is even, hjjyr is not used). Hence is defined ns’ 

T. ®irf2 + • • • + cv cm-v 

- 2^1 - 


with c, . 

The distribution of this iB has the same general form as that presented by R. L. Anderson 
for p = 0 [“Distribution of the serial correlation coefficient,’’ Annals of Math. Statistics 
13:1-13(1942)]; and for p 0 by W. G. Madow [“Note on the distribution of the serial 
correlation coefficient,’’ Annals oj Malh. SlatisUcs 16 308-310(1945)]. 

N 

For Mi consisting of terms of only one period, — = 2, 3, 4, 6,12 and 24, exact values 

of the 1% and 6% significance levels of iR have been computed for N = 12 and 24. Ap¬ 
proximate significance levels have been computed for N = 12(12)96. More of the exact 
sigmficance levels are being computed, and all computations will be extended to include 
some multiple periods and some lags greater than 1. 

2. Some New Methods for Distributions of Quadratic Forms. Haeoed 
Hoteleing, Institute of Statistics, University of North Carolina, Chapel Hill. 

Any homogeneous quadratic form in normally distributed variates of zero means has 
the same distribution as g « i(oi®i +•■•-[- o^aj), whore the oi are roots of a determinantal 
equation based on the ooefficients of the given form and the parameters of the normal 
distribution, and where the xt are normally and independently distributed with zero means 
and unit variances, We take 2oi = n, and begin by expanding the distribution of a positive 
definite form in a series of powers of q whose coefficients are polynomials in the reciprocals 
of the at. This series shows the analytioity of the function, which is then expressed as 
the product of a X* distribution function of a series of Laguerre polynomials with ooefficients 
which are simple polynomials in the moments of the oi. Indefinite forms and certain ratios 
of forms are dealt with by convolutions of these series and by other means. 



120 


ABSTRACTS OF PAPERS 


3. Frequency Functions Defined by the Pearson Difference Equation. Leo 

IvATZ, Michigan State College, East Lansing, Michigan. 

Frequency “links” formed from the Pearson difference equation provide an eEoient 
means of fitting functions to observed diatributiona. These links, involving three oonstantB 
which are determined by the first four moments of the observed aeries, correspond to a 
three-parameter family of discrete frequency functions. This family of functions is just 
as broad as that defined by the differential equation, containing funotiooe of equally diverse 
types; in addition, it has the very important-advantage that the graduation process is the 
same tor any type, Further, the aimpler tunoliona of the family all oorreapond to points 
lying in one plane of the parameter space. This plane, giving a two-parameler family 
of functions (depending upon the first three moments), is studied intensively, rather com¬ 
plete results being obtainable for areas, moments, sampling characteristics of moments, 
etc. It is also shown that the problem of discrimination among simple discrete frequency 
functions for graduating observed data is resolvable (in the plane) to the sampling distri¬ 
bution of one statistic. A special case of the two-parameter family depending on only the 
first two momenta was previously discussed 

4 Distribution of the Sum of Roots of a Determinantal Equation under a 

Certain Condition. D. N. Nanda, Institute of Statislios, University tif Xortli 

Carolina, Chapel Hill 

Leti:= II * 11 II and I* = |1 a:* || be two p-vanatc sample matrices withm and ni degrees 
of freedom Then S = xx'/ny and S* = x*x*'/ni are, under the null hypolliosis, independ¬ 
ent estimates of the same population covariance matrix. The distribution of a root, speci¬ 
fied by its rank order, of the determinantal equation [A- 0(A-1-F) I >aO,nhoroA »• n\S 
and if = n^S*, has already been given by S. N. Eoy, and by the autlior, who has also ob¬ 
tained the limiting distribution of any root when one of the samples becomes infimtoly 
large The moment generating function of the sum of the roots when ni ■=■ p ± 1 can be 
derived from the limiting distribution of the largest root. The probability rlistrlbutions 
of the sum of roots under this condition have been formulated for tlio determinantal equa¬ 
tions ha-ving two, three, and four roots. The moments of these distributions have also 
been obtained The method is applicable for the determinantal equation of any order. 
These probability distributions can easily be tabulated, ns they involve only simple al¬ 
gebraic and incomplete beta functions. 

6 . Applications of Carnap's Probability Theory to Statistical Inference. 

GebharP Tintner, Iowa State College, Ames, Iowa. 

The new theory of probability of Eudolf Carnap (“On inductive logic,” Philosophy of 
Science, vol 12,1945, pp 72 ff “The two concepts of probability,” Philosophy and Phe¬ 
nomenological Research, vol, 5,1944, pp 513ff.) introduces a distmotion between probabil¬ 
ity! , the degree of confirmation, and probabihtys, related to relative frequency. It is 
believed, that the ideas developed are useful in clarifying the problems of statistical in¬ 
ference. 

As an example, consider the case of “inverse inference, ” i.e. inference from a sample to 
the population. The evidence is that in a sample of size s there are Si individuals with 
a certain property Iff and s, = « - si without the property, The hypothesis is that in the 
populatiori consisting of n individuals there are ni individuals with property Af and m 
,n - m individuals without this property. The degree of confirmation is then: 




ABSTIUCTS OP rAPERS 


121 


Iii this formula wo have: w, the logical width of the property M, wi the logical width of 
the property nou-jTf, It •>* Wi + w*. It should bo noted that for loj = uij => 1 the formula 
becomes the classical result, i,e. a term of the hypergeometne distribution, 

This idea may be applied to statistical eBtimation. We could for instance choose tIi 
in such a fashion that c* becomes a maximum. This would be estimation by the principle 
of maximum degree of confirmation, analogous to maximum likelihood. Inasimilar fashion 
we may also use c* to establish limits for m eimilar to confidence or fiducial intervals. 

It, Circular Probable Error of an Elliptical Gaussian Distribution. IlALLErr H, 
Geemond, S. W. Marshall & Co., Consulting Engineers, Washington, D. C. 

Preliminary tables are presented, giving the radii of distribution-centered circular 
cylinders enclosing various percentages of the volume under an elliptical bivariate Gaussian 
surface. These tables are further interpreted in terms of a correlated bivariate Gaussian 
distribution. The application of these tables to impact analysis is illustrated. 


(Presented December 29,1947 at the Chicago Meeting of the Institute) 

1, The Asymptotic Analogue of the Theorem of Cramer and Rao, Heeman 
Ruein, Institute for Advanced Study, Princeton, N. J. 

The author generalizes the results of Cramdr and Rao on the minimum variance of es¬ 
timates to the case of the asyraptotie distribution of an estimate. He shows that if certain 
regularity conditions are satisfied, the formula given by Cramer and Rao remains valid. 
The main results are obtained in the case of consistent estimates, but with a stronger set 
of hypotheses, the results remain true for estimates whioh are not oonsistent. The method 
used to obtain these results is to construct statistics to whioh the theorem of Cram4r and 
Rao oan be applied, and whose variance converges to the variance of the limiting distribu¬ 
tion. This procedure is also applied to the case in which there is no limiting distribution, 
and in whioh two sequences of distributions are considered which act as if they approach 
each other. 



BOOK REVIEWS 

Sequential Analysis Abraham Wald. John Wiley and Sons, Inc. pp. vi, 212, 
14 00. 

Reviewed by M. A. GmSHicK 
Douglas Aircraft Cmpany 

The development of sequential analysis as a new tool of statistics ia by and 
large the work of Abroliam Wald. This fact in itself ^vould make the appear¬ 
ance of a book by him on this subject an important event. However, Wald in 
this book did more than discuss the present status of sequential theory. Ho 
has, in fact, written a very lucid treatise on the general subject of statistical 
inference—a treatise which is likely to have great influence on statistical think¬ 
ing. 

While this book is not written for the mathematically untrained, a kaowlodge 
of differential and integral calculus will suffice to follow all the arguments ex¬ 
cept perhaps for some sections in the appendix where the more complicated 
proofs have been placed. 

The main body of this book is divided into 3 parts and 11 cluiptors. Tart I, 
covering chapters 1 to 4 inclusive, deals with the general theory of the sequential 
probability ratio test. Chapter 1 introduces in an elementary fashion the na¬ 
tion of probability distributions, tests of hypotheses and the Neyman-l'caraon 
theory of two-yalued decisions based on a fixed sample sisic. In C-liapter 2, 
the general notion of a sequential test procedure is introduced and the operating 
characteristics of such tests are discussed. Chapter 3 deals with the sequential 
probability ratio test for testing a single hypothesis against a single alternativi*. 
Here the boundaries of this sequential criterion arc expressed in U'rras of the 
nsks, the operating characteristic and tho average sample number functions 
are developed and bounds are obtained for the errors arising from truncation 
md neglect of excess over the boundaries. Chapter 4 presents a sequential 
toeoiy for testing simple, and composite hypotheses against a set of alternatives, 
me lundamental idea introduced is the concept of a wciglit function in the 
parameter space which permits handling composite hypothese.s, or simple hypo¬ 
theses with many, alternatives, by means of the sequential probability ratio 


122 


Part II of this book, consisting of chapters 6 to 9 inclusive, deals with the 

inLect on with specific reference to lot-by-lot acceptance 

rWw'i in this chapter is the derivation of the exact 

characteristic function for a large class of tests and the development of upper 

W 6ITl Z ASN curves Sp 

er 6 deals with the problem of double dichotomies. A procedure for testing 

d fference between the parameters of two binomial distributions is develoned 



BOOK REVIEW'S 


123 


for the fixed size as well as the sequential procedure. Chapters 7, 8, and 9 are 
concerned with the application of sequential analysis to the normal distribution. 
In these chapters the sequeutial probability ratio test is applied to hypotheses 
concerning the mean of a normal distribution when the variance is known, when 
the variance is not known (non-central t case) and hypotheses concerning the 
variance when the mean is known and when the mean is not known. 

Part III consists of two short chapters and deals with multi-valued decisions 
and sequential interval eatimation. The results in these chapters are not de¬ 
finitive answers to the two outstanding problems in statistical inference but are 
merely suggestive of a possible approach, to them. Nevertheless, from the 
point of view of stimulating future research these 2 chapters are perhaps the 
most valuable sections of this book. The reader, having been exposed in the 
previous chapters to various tests the outcome of which is a two-valued decision, 
is naturally led in Chapter 10 to the consideration of tests the outcome of which 
is a multi-valued decision. The notion of a risk function, introduced elsewhere 
by the author in the non-sequential case, is again used as the main tool in handling 
multi-valued decisions sequentially. In Chapter 11 the important problem of 
setting up confidence intervals of fixed length by means of a sequential proce¬ 
dure is discussed and a possible method for accomplishing this is indicated. 

As was previously noted, the mam theorems on sequential analysis are con¬ 
tained in the Appendix and since they have all been previously published in the 
Annals tliey will not bo mentioned in the present review. The Appendix, to¬ 
gether with the main body of the book form a fairly exhaustive treatment of 
sequential theory. A notable exception to this is the lack of any mention of the 
published research on sequential pomt estimation. This is probably accounted 
for by the fact that this research came too late to be included in the book. Other 
minor omissions that may be noted are references to the generalization of the 
Fundamental Identity to more than one dimension and other theorems on 
sequences of functions of random vectors wliich have appeared in print. Also 
no mention is made of the similarity of sequential analysis to the problems of 
the random walk and the gambler’s ruin. This, in the opinion of the reviewer, 
is regrettable. 

This book will make a very suitable companion to the book Sequential Analy¬ 
sis of Siaiislical Data: Applications prepared by the Statistical Research Group, 
Columbia University (sec review by J. W, Tukey, Ann. of Math. Siat. Yol. 
xviii, 1947), While there is some overlap in the material covered, the two books 
differ in emphasis. Wald's book, though not highly technical, is more in the 
nature of a textbook on the theoiy and application of sequential analysis. The 
SRG book on the other hand, was prepared mainly for statisticians who may 
wish to use sequential analysis in practice. The latter book is therefore more 
detailed and puts less emphasis on the theoretical aspects of the sequential 
procedure, 

The book is surprisingly free of typographical errors which is a tribute to the 
high quality of the editorship. 



124 


BOOK REVIEWS 


Statistical Methods. Gmgo W. Sncdecor. .4mes, Iowa: The lowa State 
College Press, Inc , 1940; pp. xvi, 485. $4.50. 

Reviewed by Frederick hlosxEDLER 
Harvard Universiiy 


Siaiisiical Methods is a non-matliematical treatment of mcKlern experimental 
statistics. Few non-matheroatioal books arc available that treat such topics 
as confidence limits, use of transformations, and analysis of variance ami covari¬ 
ance in the detail presented by Snedecor. The examples are largely, but not 
entirely, drawn from agriculture and animal husbandry. The exercises for 
students are extensive and thought-provoking. 

Unlike most non-mathematical texts the book under review does not spend 
pages and pages on methods of recording frequencies and methods of computing 
countless moments which are seldom used in the later developments of the te'xt. 
There is no long exasperating discussion of kurto.si.s and skewneas; and there is 
no parade of qualitative Greek names for categorizing frequoney distributions. 

The reviewer has used this book for teaching a .second course in statistics to 
social science majors with reasonable success. The main tlisadvanltigo was the 
biological nature of most of the examples, hut until .some author write.s a (>oni- 
parable book using social science examples, the levieiw'r will conlimic to iko 
S nedecor’s material for a large part of the course. 

The main differences between the Third and Fourth Editions of this text Imvi' 
been adequately summarized by Snedecor; 

“(i) greater emphasis has been placed on the theoretical condilion.s in which 
the various statistical methods have validity, and concurrently (ii) on the conduct 
of the experiment so as to mcorporate in the data the information rlesired; (iii) 
estimates and fiducial statements have been brouglit into equal prominence 
with tests of hypotheses; (iv) there is increased relianee on c-xperimental sam- 
plings to exemplify distribution theory; (v) the treatment of correlation and of 


experimental designs has been expanded; and (vi) the methods for dispropor¬ 
tionate subclass numbers have been extended to include all those neceasarj^ for 
ordinary needs.” Some more obvious change.'? in the Fourth Edition are the 
entirely new type and summaries which are included at tlie end of some, of the 
chapters. The practice of using random sampling numbers (iv) to help explain 
theory has long been employed by teachers of statistics, but few authors have 
token as much advantage of this technique as ha.s Snedecor. In the h’ourth 
Edition co^dence intervals am widely used (iii), Tlio author uses the adjec 
ives confidence and^ fiducial” more or less interchangeably, but it is tho 
reviewer s opinion that it is the Neymon concept rather than the Fisherian that 
predominates. It should be remarked that this is one of the few texts that 
give the students the idea that in linear regression we do not predict y with tho 
same accuracy for eyeiy a: even when linearity and homoscedasticity hold (v). 

e mam emphasis of the book ia on the analysis of variance. The author 
succeeds extremely well m showing the student how to carry out the analysis 



BOOK REVIEWS 


125 


even at rathci' complex levels- On some other points he was not quite so suc¬ 
cessful. For example, the reviewer feels that the meaning of “interaction” was 
never gotten across, and that for the student the higher order interactions are 
still just things to be computed. Furthermore in attempting to make sure 
that the student understands how to do the computation the author often does 
not encourage the student to take any overall view of the data before blindly 
starting to compute. In addition, reasons for doing the experiment are some¬ 
times vague and the conclusions are often couched only in the jargon of analysis 
of variance. Therefore, the student seldom gets an opportunity to find out 
what kinds of recommendations might reasonably be made as the result of an 
experiment. Perhaps the worst example is on pages 275-280 Here the 
experiment deals with yield of wheat in 48 pots, with two series of soil treatments, 
humus and chemical. Anyone glancing over the results of the experiment will 
be startled to find that every yield fiom pots with "no humus treatment” (12 
observations) is greater than any yield with “humus treatment” (36 obser¬ 
vations). The render will be further startled to find 1hat all the evidence tends 
to support the notion that “no chemical treatment” is at least as fruitful as any 
of the chemical treatments tried. However, Snedccor says “The striking feature 
of this experiment is the discrepance among the subclasse.s The chemicals 
applied to one humus treatment produced yields out of accord with those from 
otlier humus treatments.” Snedecor then pushes on to a more subtle analysis. 
The reviewer feels that hero as elsewhere in the book the author occasionally 
forgets that the extended analysis looks rather ridiculous unless the practicality 
of applying the technique is discussed. The example considered heie is one in 
which the point could profitably be made that everyone can see from a visual 
examination of the data what the results of the experiment show. The analysis 
backs up the student’s common sense apprais^ of the situation and gives him 
more confidence in and understanding of the method when it is applied in more 
delicate situations. It seems to the revieiver that too many times the appli¬ 
cation of the analysis of variance obfuscates the main point of the experiment. 
In the haste to get to the computations and the comparisons of interactions and 
errors the author frequently neglects to impiess the student with the funda¬ 
mental differences between means and their ultimate interpretation, However, 
the author does bring out clearly the notion of the various estimates of variance, 
a subject frequently neglected. 

In the next to last chapter the binomial and Poisson distributions are discussed. 
In this connection the inverse sine and the square root transformations are 
treated briefly, as is the logarithmic transformation It is surprising that no 
indication is given of the theoretical variances when the inverse sine and square 
root transformations arc used. The theoretical discussion of the transformation 
is limited to the remark that these transformations tend to make the variance 
independent of the means, but there is no indication of the further advantages. 
This is surprising because in a much earlier chapter the use of Fisher’s trans¬ 
formation for correlation coefficients was treated quite adequately. It seems 



126 


BOOK REVIEAVS 


to the reviewer that in a later edition the use of transformation might well be 
moved forward in the book, and that the theoretical and practical implicationa 
might be treated more thoroughly. 

As in most other texts the final chapter "Design and Anab'sis of Sampling” 
needs very considerable expansion. 

The book begins (Chapter 1) with a consideration of the sampling of attributes, 
inferences that can be drawn about the population, confidence limits, use of 
cb-square in a 1 x 2 table, and some discussion of the use of ratios, rates, and 
percentages. Measurement data is then (Chapter 2) discu.ssed including the 
computation and application of the mean, range, standard deviation, probable 
deviation, median, and quartiles. The concepts of null hs'pnthesis and confi¬ 
dence limits are introduced m Chapter 2 and elaborated in Chapter 3 which 
concerns samplmg from a normally distributed population, random samples, 
distribution of the mean, variance, standard deviation, and of t. The com¬ 
parison of two groups in contrast to individuals is treated in Chapter 4 including 
groups with different numbers of individuals. Clia])ler fi provides material on 
short cut methods of computation using calculating machines, code numbers 
are explained, suggestions spoilt significant numbers and rates ami percentages 
are given, and the use of the ratio range/sigma is introduccfl. 

After considermg linear regression and correlation (Chapters li, 7) the author 
relates the two notions, and then goes on to consider some intei’estiug special 
cases of correlation. Chapter 8 deals with largo sample incllnxls. Chapter 9 
concerns enumeration data with more than ono degree of fnaHlom, discusses 
adjustments of cb-square and its computation with large numbers of degrees of 
freedom, and describes the analysis of 2 x 2 x 2, R x 2, and R x C tables. The 
computation of the analysis of variance for two or more gi’ouiis of measurement 
data and with two or more critj^ria of classification: variance ratio F, use of 
Latm square, analysis with disproportionate subclass numbers, and the use of 
randomized blocks are considered in Chapter 10 and 11, while analysis of co¬ 
variance is treated in Chapter 12 (22 pages). Multiple regression, including 
partial and multiple correlation coefficients, tests of significance and confidence 
limits are handled b Chapter 13 and curvilmear regression considered in Chapter 
15. Chapter 16 deals with bmomial and Poisson data, and Chapter 17 diacussefi 
the design and analysis of sampling, including sampling from a homogeneous 
or small population and the effectiveness of stratification. 

It seems to the reviewer that at the present time one would be hard put to 
find a better statistics text written at this level, 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Franz L. Alt, who has been with the Econometric Institute, New York, 
as Assistant Director of Research, is now Deputy Chief of the Computing Labo¬ 
ratory at the Ballistic Research Laboratories, Aberdeen Proving Ground, Aber¬ 
deen, Maryland. 

Mr A. George Carlton has accepted a position as Assistant Professor of 
Mathematics at the University of Illinois. 

Assistant Professor Paul R. Halmos, University of Chicago, Chicago, Illinois 
is on leave for the academic year. He is spending the year at the Institute for 
Advanced Study, Princeton, New Jersey on a Guggenheim Fellowship and 
will return to the University of Chicago in September, 1948, 

Mr, Henry F. Heblcy of the Pittsburgh Coal Co. spent most of last summer’ 
in Eastern Europe carrying out a suivey on coal production and fuel availa¬ 
bility in Poland This work was carried out in the interest of the International 
Bank for Reconstruction and Development, 

Dr. Haiold D. Larsen, former Associate Professor at the University of New 
Mexico, has joined the faculty of Albion College, Albion, Michigan. 

Mr Dickson H. Leavens has resigned as Research Associate of the Cowles 
Commission for Research in Economics. He will continue as Managing Editor 
of Economeirica and may be addressed at 1632 Wood Avenue, Colorado Springs, 
Colorado. 

Professor S. B. Littauer, ivho has been Chairman of the Mathematics Depart¬ 
ment, Newark College of Engineering, Newark, New Jersey, has now accepted 
an associate professorship in the Department of Industrial Engineering, Columbia 
University. 

Professor Harris F. MacNeish, who has been Chairman of the Department of 
Mathematics at Brooklyn College since its foundation in 1930, has resigned 
to accept a visiting professorship in Mathematics at the University of Miami, 
Coral Gables, Florida 

Mr. Clifford J. Maloney has resigned a position ns Research Associate in the 
Statistical Laboratory of Iowa State College to serve as Chief, Statistics Branch, 
Camp Detrick, Frederick, Maryland, an agency of the Chemical Corps of the 
United States Array. 

Mr. Monroe L. Norden, who has formerly been with the Ballistic Research 
Laboratories, Aberdeen Proving Ground, Maryland, has accepted a research 
position in theoretical or mathematical statistics at the Douglas Aircraft Co., 
Santa Monica, California. 

Mr. W. E. Pattee has resigned his position as statistical engineer with the 
Canadian Industries Limited, Skarvinigan Falls, Quebec and has accepted a 
position as senior chemist, Ottawa Mill, E. B Eddy Company, Hull, Quebec. 

127 



128 


NEWS AND NOTICES 


Mr. Robert I. Piper, who w!is formerly plant staff at the Southern 

California Telephone Company of Los Angeles, has been transferred to the 
systems office of the Pacific Telephone and Telegraph (’ompany. He will assist 
in planning and analysing sampling surveys of the wages rat(>3 prevailing in 
the Pacific coast states in which the company operates. 

Mr. Herbert Solomon, who was formerly an instructor at the, (lollege of the 
City of New York, has accepted an assistant professorship in the Mathcunatics 
Department, Newark College of Engineering, Newark 2, New Jersey. 

Dr. A G, Swanson, formerly an assistant chairman of the Department of 
Mathematics and Mechanics at the General Motors Institute, Flint, Michigan, 
has accepted an associate professorship in the Department of Mathematics, 
Gustavus Adolphus College, St. Peter, Minnesota. 


A federal center of applied mathematics—the National Api)lied Mathematics 
Laboratories—has been established as a divi.sioii of the National Bureau of 
Standards. The new organization is oriented around modern mathematical 
statistics as applied to the physical and ongmccring sciences and to the develop¬ 
ment and use of modern high speed computing. The applied matlii'inatics 
laboratories include four separate laboratories: the Institute of Numerical 
Analysis, the Computation Laboratory; the Stalistical Engineering Laboratory; 
and the Machine Development Laboratorj'. 

Two members of the Institute have been given important positions in this 
organization. Dr. John Curliss, who has been Director’s .•Vssistaiit in Applied 
Mathematics at the Bureau of Standards, has been named Chief of the National 
Applied Mathematics Laboratories Dr. Churchill Eisenhart has been ap¬ 
pointed head of the Statistical Engineering Laboratorj'. 


Statistical Summer Sessions at the University of California, Berkeley 

FoUowmg the encouraging experience of last year the University of California 
offers statistical programs in the two Summer Sessions of 1948. The teaching 
staff is as follows: 

Raj Chandba Bose, Professor of the Univereity of Calcutta, India. 

Miss Evelyn Fix, Lecturer at the University of California, Berkeley. 

Erich L. Lehmann, Assistant Professor of the Univemity of California, 
Berkeley. 

Michel LohvE, Reader at the University of London, England. 

Jerzy Nbyman, Professor of the University of California, Berkeley. 

Abraham Wald, Professor of Columbia University, New York, 

Courses in statistics aie offered on both the graduate and the undergraduate 
levels. The graduate courses, all given during the First Summer Session, June21 



NEWS AND NOTICES 


129 


to July 31, are meant primarily for students who either have already obtained 
their Ph.D. degree or arc working towards it. Therefore, apait from formal 
classes, it is proposed to hold extensive seminars in which the work of students 
will be discussed. No specific prerequisites to graduate courses will be required. 
However, to benefit from the courses, the students must be generally familiar 
with the theory of statistics. In addition, course 272 and especially 271 will 
require a reasonable knowledge of the theoiy of functions. 

There will be two undergraduate courses offered, course S12 during the First 
Summer Session, June 21 to July 31, and course S113 during the Second Summer 
Session, August 2 to September 11. Both of these courses were recently in¬ 
troduced into the curriculum and are prerequisites to more advanced courses 
in statistics They are offered during the Summer Sessions for the benefit of 
students, otherwise advanced, who plan to attend more advanced courses in 
statistics during the fall semester. Besides, course S12 is recommended for 
students who do not intend to speciahze in statistics but wish to acquire some 
knowledge of this subject as a part of their general education. 

The Statistical Laboratory will be available for students doing research. 

FirsL Summer Session 


S12 lilomciils of Pzobability and Statistics 

271. Random Functions 

272. Sequential Analysis 

273. Design of Expouments 

S290s. Seminar in Theoiy of Statistics 
290t Seminar in Design of Experiments. 
S295. Individual Research. 


Mb. Lehmann 
M n. LoiivE 
Mb. Wald 
M n. Bosi 
Mr. LoivE, Mr Wald 
Mr Bose 
Mr. Bose, Mr LohVE, 
Mr. Neyman, Me. Wald 


Second Summer Session 

S113. Second Course in Probability and Statistics. Misa Fix. 


Statistical Sessions at Alabama Polytechnic Institute 

Professor George W. Snedecor, President of the American Statistical Associa¬ 
tion and Research Professor of Statistics at Iowa State College, will be Visiting 
Research Professor of Statistics at Alabama Polytechnic Institute durmg the 
Spring .Quarter, from March 22 to June 4, 1948. Professor Snedecor will 
lecture on Statistical Experimental Design and will be available for statistical 
consultations. 

The newly formed Stastistical Laboratory at A.P I. will also offer a course 
in Survey Sampling during the Spring Quarter to be taught by the Director, 
Professor T. A. Bancroft. Conferences in applied statistics for research workers 
in the lower southeastern states are being scheduled during the time of Pro¬ 
fessor Snedecor's visit. 



130 


NEWS AND NOTICES 


New Members 

The following persons hiwe been elected lo membership in the tnslilulc 
(September 1 to November 30, 1047) 


Afzal,M.,M A. (Panjab, India) Graduate student at Columbia Univ., 1038 John Jay Ball, 
Columbia University, New York 31, New York. 

Billeter, Ernest P., Pk.D. (Univ. of Baale) Scientific Assistant (Statistical Office, Zurich) 
Turnerslrasse 33, Basle, Switzerland, 

Sishop, David James, M So. (London) Head of Operational Rcsenroh Section of British 
Iron and Steel Research Association, 11 Park Lane, London W. 1., England. 

Brooks, Hamilton, B.See (Univ. of Pittsburgh) Design Engineer, Westinghouso Electric 
Corp., P.0, Box 383 E, Piilsburgh, Pennsylvania. 

Craw, Alexander R., M S. (Univ. of Notre Dame) Instructor in Math., U. S. Naval 
Academy, Annapolis, Maryland. 

Edwards, Daisy M., AM. (Columbia Univ.) Lecturer in Statistics, University of 
London, Institute of Education, 1, Oakfield Court, Queens Road, Weybridge, Surrey, 
England. 

Havermark, K. Gunnar, Chief of Division, Royal Social Board, Lagerlofsg 8, Stockholm, 
Sweden. 

Hollingsworth, Charles A., Ph.D., (State Univ. of Iona) Research Chemist, S 04 Maple 
Ave., Waynesboro, Virginia. 

Hurd, CuthbertC., Ph.D. (Univ. of Ill.) Plant Statistician, Carbide and Cnrlion Chemi¬ 
cals Corp., Oak Ridge, Tenn. 

Isaacson, Stanley L., M.A. (Johns Hopkins Univ.) Graduate student at Columbia 
Univ , 3S33 Loyola Soulhway, Ballmore, Maryland 

May, Kenneth, Ph.D,, (Univ of Calif.) Assistant Professor of MathematiBS, Oarleton 
College, Northfield, Minnesota. 

Mirsky, Robert, AM. (Johns Hopkins Umv.) Graduate student at Columbia Univ., 
7 West lOSlh Street, Shanks Village, Orangeburg, New York. 

Mulhall, Harold, B.So (Sydney) Lecturer in Mathematics, Department of Mathemat¬ 
ics, University of Sydney, Australia. 

Palm, Conny, Ph D. (Stockholm) Docent, Ynglingar 11, Djurskolm, Sweden. 

Pease, Katharine, A.M. (Smith College) Instructor in Psychology, Barnard College, 
Columbia University, New York 27, New York. 

Peckham, Cyril G., M.S. (Univ. of Ill.) Assistant Professor of Mathematics, University 
of Dayton, Dayton 9, Ohio 

Peterson, Raymond P., Jr., B.A, (Univ. of Calif., Los Angeles) Assistant in Mathemat¬ 
ics, University of California, Los Angeles, Calif., lOm Ashton Ave., Los Angeles 
34 , California 


Pike, Eugene "W., Ph.D., (Princeton) Member McFarlan, Groth <& Pike, 510 Audubon 
Ave., New York 33, New York. 

Pitman, Edwin J. G., M.A. (Univ. of Melbourne) Professor of Mathomatios, Univ. of 
Tasmania, Hobart, Tasmania. 

Rigby, Fred D., PhD., (Univ. of Iowa) Mathematician, Office of Naval Rosoaroh, P.0, 
_ Box 334 , Falls Church, Virginia. 

®®Witt, Ph.D, (Umv. of Iowa) Associate Professor of Statistics, Box 
2680, Umversity, Alabama. 


riniTasan* K., M A (Madras) Assistant Lecturer, Mathomatioe Department, 
Raja’s College, Pudukkottah, S-I-R, South India. 

Control Analyst, 4134 Ivanresl Road, Grandville, Michigan. 

^ Berkeley) Associate, School of Public Health, 

3043 Wheeler Si , Berkeley, California. 



NEWS AND NOTICES 


131 


Trtadade, MariOi Chief of the Statistical Division of the Institute de Resseguros do Brazil, 
Ttua SenadoT Soares SS, ap. SOi, Eio de Janeiro, Brasil. 

Von Schelling, Hermann, Ph.D (Univ of Berlin) Naval Medical Research Laborato¬ 
ry, U. S. Submarine Base, New London, Conn. 

Whldden, Phillips, A.B. (Harvard) Part-time Instructor in Mathematics, Carnegie Insti¬ 
tute of Technology, Pittsburgh 13, Pa. 

Wolman, William, B.B A. (College of City of New York) Statistician, New York State 
Division of Housing, S95 Parkside Aventie, Brooklyn SB, New York. 

Woodbury, Lowell A., Ph.D. (Univ of Michigan) Assistant Professor of Physiology, 
Dept of Physiology, University of Utah Medical School, Salt Lake City 1, Utah. 

Yusuf, Mohammad, M.A. (Aligarh Muslim Univ , India) Graduate student at Columbia 
University, SOS, Furnald Hall, Cohmhia Untverstly, New York S'!, New Yoik 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 

The thirtieth meeting of the Institute of Mathenialifnl Statistics was lirld in 
Berkeley, California on Monday and Tucsdaj'', Doccmhor 22 and 23,11)47. The 
meeting was attended by approximately 70 persons including the following 31 
members of the Institute; 

G. A. Baker, G, G. Bockstead. B. M. Bennett, R. U. Boiincr, I'rancoH L. Camplioll, E, L. 
Crow, Dorothy Cruden, W. J Dixon, R. Dorfman, O. CJ. Eldrcdgc. E. A. Fay, Kvclyn Fix, 
M A Girshick, J. Gurland, T. D.llama, W. L. Hart, J, L Ilodgca, Jr., 1*. fi Ilool, 11, M. 
Hughes, T. A, Jeeves, H, S Konijn, G M. Kuznets, L' h. Lehmann, R, B. Leipnik, J. Xey- 
man, Gladys Eappaport, H ScholTd, T, W. Simpson, C' M. Sirin, J. K. Walsh and H_ 
Working, 

The Monday morning program, with rrofe.'-sor J. Xcyman prp.siding, com-i.sted 
of the following contributed papers: 

1, The Perjomance Characterishc of Certain Methods Jor Ohlaming L'onfidmcc Intervals, 
Mr B M Bennett, University of Calilornia, Berkeley. 

2 Some Further Results on the Bernoulli Process 
Dr T. B Hams, Douglas Aircraft Company. 

3. Most Powerful Tests of Composite Ilypolhescs I. Xoriiiul DcslribiUions. 

Dr. B L, Lehmann and Dr. C, M, Stein, University of ('alifornia, Beikelry, 

4. On the Selection of Forecasting Formulas. 

Professor P. G, Iloei, Umveisity of California, Los .\nRele8. 

The Monday afternoon program, with Professor II. Srluifl'd presiding, also 
consisted of contributed papers as follows; 

1 On the Power Function of the "Best” l-lesl Solulion of the Behrens-Fwher Prohlcm, 

Dr J E Walsh, Douglas Aircraft Company. 

2 On Sequences of Experiments 

Dr C. M Stem, University of California, Berkeley. 

3. 'The Effect of Selection above Definite Lower Limits of Linear Functions of S''ormallg 
Distributed Correlated Variables on (he Means and Variances of Other Linear Functions. 
Professoi G A. Baker, University of Califoinia, Davis. 

4 An Inversion Formula for the Dislnhulion of a Ratio of Random Variables. 

Dr, J. Gurland, University of California, Berkeley. 

5 Independence of Paiamelers and Sufficient StatMics. 

Dr E W Barankin, University of Oaliforuia, Berkeley. 

The Tuesday morning session, with Professor R. A. Gordon presiding, was 
devoted to the following invited and contributed papers on eeonomctricB: 

1. Remarks on the Theoiy of Indices 
, Professor G C, Evans, University of California, Berkeley. 

2 Interrelations of Theory and Statistical Reseaich in Economics. 

PiofessorH Working, Stanford University 

3 Statistical and Case Methods in a Study of Labor Mobility. 

Professor D, McEntire, University of California, Berkeley. 

Discussion’Dr M Lipton, University of California, Berkeley. 

132 



HEPORT ON NEW YORK MEETING 


133 


4 Distributions Associated mlh Continuous Stochastic Processes 
Dr E B Leipmk, University of California, Berkeley 

5 On Some Methods of Evaluating Railway Costs (By title) 

Miss Evelyn Fix, University of California, Berkeley 

There was a dinner on Monday evening for members and guests at the Hotel 
Claremont and an informal discussion and coffee on Tuesday afternoon. 


REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 

The Tenth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Commodore Hotel, New York City, on December 28-30,1947. The 
meeting was held in conjunction with the American Statistical Association. 
The following 173 members of the Institute were m attendance; 

E S Acton, E L Andeison, H E Arnold, L A Aroian, M. Aatrachaii, 11 M Baldwin, 
W D. Baten, E. E. Bechhofer, G W Beebe, M H, Belz, A A Bennett, A. J. Berman, A. 
Blake, G I Bliss, P Bosohan, A H Bowker, A.E Brandt, T.H Biown,M A Brumbaugh, 
M C. Bmyeie, P T Bruyere, T A Budne, E. W Burgess, E. S. flurington, B H Camp, 
G C. Campbell, P G, Cailson, Ji , U. Chand, II. Chernoff, Kai-Lai Chung, P 0. Clifford, 
W. G. Cochran, D. D Cody, J Coi afield, G. M, Cox, J. II Curtiss, J, F. Daly, G B.Dant- 
zig, D. G. Delhi, H F. Dorn, A J Duncan, C W Dunnett, D. Duiand, J Dutka, P S. 
Dwyer, G L, Edgott, 0 Eisoiihait, B. Epstein, M. W Eudey, W D Evans, Will Feller, 
C D. Ferris, C. B, Fine, M, M Flood, L E Fiankcl, J E, Freund, B. Friedman, Hilda 
Goiringei, M. A Gcisler, II H Geimoiid, M A Girshick, Abialiam Golub, C.H. Graves, 
S W Greenhouse, J A. Greenwood, T N E Grevillo, J, I Griffin, E T Gumbel, M, 
Gurney, K W. Halbert, Max Halperin, M II. Hansen, T. E Hams, B Haiahbarger, 
AlexHart, P. M Hauser, J D Heido, L H Hcibacli.M W Hiisch, Harold Hotelling, H 
M Ilumes, C. C. Hurd, S Jablon, C. M Jaeger, A. S Kaitz, Leo Katz, T. L Kelley, L S 
Kellogg, L. F Knudsen, A. K. Kury, Jack Laderman, M LeLeika, Joseph Lev, Howard 
Levene, J E Lieberman, Julius Lieblem, S. B Liltauer, Eugene Lukacs, Geo. A Lundberg, 
J, C. McPherson, Benjamin Malzbeig, Sophie Marcuse, E S Maika, H C Mathisen, J. 
W. Mauohly, A, L. Mayerson, Maigaiet Mencll, E. B Mode, E C Molina, M, E. Moore, 
D. J. Moriow, J. E Morton, JackMoshman, Hugo Muench, D N.Nanda,M G Natrella, 
Doris Newman, G. E Nicholson, Jr , Harold Nisselson, Nilan Noms, H. W. Norton, P S. 
Olmstead, A L. O’Toole, A E, Pauli, C. N Payne, Katherine Pease, M P. Peisakoff, E. 
W Pike, 0 A. Pope, G B. Puce, L J Eeed, J S Ehodes, S F Eobinson, A C Rosander, 
Ernest Rubin, P. J. Rulon, Rose Sachs, Prank Saidol, Arthur Sard, M M, Sandomire, F. 
E Satterthwaite, E D. Schell, Bernice Scherl, 0. N Scrbein, R. G, Seth, Harry Shulman, 
Rosedith Sitgreaves, C. DeW Smith, G. W. Snedeeor, Herbert Solomon, D. E South, 
Arthur Stein, G. T Steinberg, Joseph Steinberg, A. I Sternlioll, S. A. Stouffer, J V. Stur- 
levant, B. R. Suydam, W R,. Thompson, Gerhard Tintnor, J W, Tukey, D. F Votaw, Jr., 
A. J, Wadman, H. M, Walker, Dzung-ahu Woi, Sidney Weiner, Samuel Weiss, Sophie R. 
Wilkey, R. I. Wilkinson, S. S. Wilks, C. P. Winsor, Jacob Woltowitz, W. J. Youden. 

The first session, a joint session with the American Statistical Society, was 
held on the morning of December 28 and was devoted to the topic The Teaching 
of Slaiistics. Professor W G Cochran of North Carolina State College presided. 
A paper entitled Three Recent Reports Dealing with the Teaching of Statistics, 



134 


REPORT ON NEW YORK MEETING 


the Training of Siahsticians and ihe Crisis in Stalislical Personnel was presented 
by Dr. James D. Paris of the Metropolitan Life Insurance Company. Many 
members participated in the general discussion which followed. 

The second session on The Teaching of Slalislics also with the American Sta¬ 
tistical Association, was held at 1:15 P.M. Professor Francis G. Cornell of the 
University of Illinois was chairman. The main paper of the session was the 
paper by Professor George W. Snedecor of Iowa State College entitled Syllabus 
for a Proposed Course in Basic Statistics. This was followed by prepared di,s- 
cussion by: professors Elmer B, Mode, Boston University; Helen hi, Walker, 
Teachers College, Columbia University; Samuel A. Stouffer, Harvard Uni¬ 
versity, and Albert E. Waugh, Department of Economics, University of Con¬ 
necticut. Many members participated in the general discussion. At tho 
conclusion of this session, a film on Modem Quality Control was shown hy Mr. 
Simon Collier of the Johns Manville Company. 

Two Monday sessions, also held jointly with the American Statistical As¬ 
sociation, and with the cooperation of the Operations Evaluation Group of the 
Navy and the Operations Analysis of the Air Force, wore devoted to Operations 
Research. Professor Edward L. Bowles of Massachusetts Institiito of Tech¬ 
nology presided at the Morning session. The following papoifj; 

1 Operaliona Research in the Department of the Navy. 

Dr. J Steinhardt, Director, Oporations Evaluation Group. 

2 . Operations Research in ihe Department of the Air Forces. 

Dr. Leroy A. Brothers, Chief, Operations Analysis. 

were followed by discussion by Dr. Arthur A. Brown, Operations Evaluation 
Group, Dr. Thomas I. Edwards, Operations Analysis, Professor G. Baloy Price, 
The University of Kansas and Wartime Operations Analyst and Dr. W. J. 
Youden, Douglas Aircraft Company and Wartime Operations Analyst. 

Dr. Merrill M. Flood, Assistant Deputy Director of Research and Develop¬ 
ment, General Staff, U. S. Army, presided at the afternoon session. The fol- 
lowmg papers were presented: 


1 . Operations Analysis in the Southwest Pacific Air War 

Dr^ Roger I. Wilk.neon, Bell Telephone Laboratories and Wartime Oporatione Ana- 

2 . Operations Analysis of Alt-Sea Rescue, 

Dr. E S. Lamar, Operations Evaluation. Group, 

3 . Factorial Chi-Square in Test Shooting. 

Laboratory and Wartime 

4 . Mathematical Techniques of Program Planning. 

r George Dantzig, Consultant to the Air Comptroller, Headquarters, USAF. 

A session on the Application of ihe Theory of Bzireme Values was held jointly 
Jacob on Tuesday, December 30. Professor 



REPORT ON NEW YORK MEETING 


136 


1 Introduction. The Mathematical Theory 0 / Extreme Values. 

Professor Richard Von Miseg, Plarvard University, 

2 Applications to the Prediction of Flood Flows 
Professor Emil Gumbel, Brooklyn College. 

3 Applications to Meteorology 

Hi Horace Norton, Weather Bureau, Washington, D. C 

4 Applications to Fiacture Problems 

Di Benjamin Epstein, Coal Research Laboratory, Carnegie Institute of Technology. 

The session concluded with discussion by Miss Marion Sandomire, Navy Depart¬ 
ment, Bureau of Ships and Dr Bradford Kimball, Port Washington, New York. 

A session on Statistical Techniques in Life Insurance was held jointly with 
the American Statistical Association at 1:16 P M., December 30. Mr. Robert 
,1. Myers, Actuaidal Consultant, Social Security Administration, was chairman 
of the meeting The follondng papers were presented: 

1. Piohlenis with Sampling Proceduies for Reserve Valuations 

Mr Geoige C Campbell, buporvisor. Actuarial Division, MetiopolitanLife Insurance 
Company 

2. Sampling Errors m Life Inswance Mortality and Other Slalishcs. 

Ml Donald Cody, Assistant Actualy, Equitable Life Assurance Society 

3. Recent Developments in Graduation and Interpolation 

Di.T. N. E Groville, National Office of Vital Statistics, U S Public Health Service. 

A session of contributed papers was held at 3:30 P.M. on December 30. Dr. 
T. N. E. Greville of the National Office of Vital Statistic.^ presided. The fol¬ 
lowing papers were pro,sen ted: 

1. Distribution of the Circular Serial Correlation Coefficient for Residuals from a Fitted 
Fourier Series {Preliminary Report ) 

Professor R, L. Andcison, North Carolina State College and Professor T. W. Ander¬ 
son, Jr., Columbia University. 

2. Same New Methods for Distributions of Quadratic Forms 

Professor Harold Hotelling, Institute of Statistics, University of North Carolina. 

3. Frequency Functions Defined by the Pearson Difference Equation. 

Professor Leo Katz, Michigan State College, East Lansing. 

4. Distribution of the Sum of Roots of a Detenmnantal Equation Under a Certain Condition 
Mr D. N. Nanda, Institute of Statistics, University of North Carolina. 

5 Applications of Carnap’s Probability Theory to Statistical Inference. 

PiofesBor Gerhard Tintner, Department of Economics, Iowa State College 

6. Circular Probable Error of an Elliptical Gaussian Distribution 
Dr. H. H. Germoncl, S. W. Marshall & Co., Washington, D. C. 

The annual business meeting of the Institute was held at 4:30 P.M., December 
29,1947 in the ball room of the Commodore Hotel. There were reports by the 
President, Secretary-Treasurer, Mr Morris Hansen, Chairman of the Com¬ 
mittee on Planning and Development, and Dr. John Curtiss, Chairman of the 
Program Committee Mr Hansen presented a tentative form of the proposed 
new constitution while Dr. Curtiss discussed program plans. There was some 
discussion on these general questions from the floor 



136 


KBPOBT ON THE CHICAGO MEETING 


Piofessor A. Wald was elected President, and Dr. ChnrchiU Eigenhart and 
Professor Henry Soheffd, Vice-Presidents. 


Pahh S, D\s^er, 
Secretary. 


REPORT ON THE CHICAGO MEETING OF THE INSTITUTE 


The thirty-second meeting of the Institute of Mathematical Statistics was 
held at the Sherman Hotel, Chicago, Monday and Tuesday, December 29-30, 
The meeting -was held in conjunotion with the one hundred fourteenth meeting 
of the American Association for the Advancement of Science and Co-operating 
Associated Societies. The following twenty-eight members of the Institute 
attended the meeting: 

W. Bartky, D H. Blackwell, G. M Brown, I. \V. Burr, A, G. Carlton, M. Castollanoa, 
C W. Cotterman, A T Craig, J. H, Davidson, H.. C. Davis, W, E Doming, M. Klvobaok, 
M L Garbuny, W W. Gufcznian, T. J, Jaiamillo, E. S. Keeping, T. C. Koopmans, E. L. 
Labti, M M Lavin, K, May, J .4.. Pierce, 0, Reiersol, 11. Rubin, L. J. Savage, J. Silber, 
W. A Wallia, E. L Welker and J, W, Wilkins 

The Monday afternoon session was devoted to contributed papers of Section A, 
AAAS, and of the Institute, and to the Vice-Presidential address of Soolion A. 
The following papers were presented: 

1. On the Boundary Layer Motion along a PeriodKally Osoillahng Plane in Com^rettibh 
Viscous Fluids. 

Dr M Z. Krzywoblocki, University of Illinois, 

2. Variaiims of the Prohahihly of Unfair Election Results. 

Dr Kenneth May, Garleton College, 

3. Normal Equations viilh Nearly Vanishing Determinants. 

Dr.M Herzberger and Dr, R. Noriis 

4. Composition of Binary Quadratic Forms. 

Professor Gordon Pall, Rlinois Institute of Technology 

5. A Proof of the Asymptotic Analogue of the Theorem of Cramit and Rao. 

Dr. Herman Rubin, Institute for Advanced Study, 

6 The Solution of Differential Equations in the Presence of Turning Poinla, Vice-Presi¬ 
dential address of Section A. 


The Tuesday afternoon session was also a joint session of Section A and the 
Institute, with Dean Walter Bartky of the University of Chdoago presiding, 
ihe following two papers were presented upon invitation of the Institute: 


1. Application of the Radon-Nikodym Theorem to the Theory of Sufficient Stamics, 
Professor P R. Halmos and Dr. E J. Savage, University of Chicago. 

2 . UnD%<L9Gd Sequential EshrnciUou, 

ProfeasorDavid Blackwell, Howard University. 



REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1947 


The healthy growth of the Institute has continued through 1947, The 
naembership increased from 900 to 1046. This increase is gratifying as a sign 
that more and more people appreciate the usefulness of basic theory and are 
ready to support research by making our Annals possible. It is is also pleasing 
to note that statistical theory and methodology are reaching new fields and 
that new groups as a whole are becoming conscious of the usefulness of contact 
with mathematical statistics. These developments are reflected in the meetings 
of the Institute. 

Meetings. The Ninth and Tenth Annual meetings (for 1946 and 1947) were 
held in the traditional way in conjunction with the meetings of the American 
Statistical Association (January—^Atlantic City and Christmas—New York). 
The Tenth Summer Meeting was held with the American Mathematical Society 
and the Mathematical Association of America (September—^Yale). Regional 
meetings M'ere held in California (June—San Diego, December—Berkeley) and 
in Chicago (December), the latter in conjunction with the meetings of the 
American Association for the Advancement of Science (AAAS). Moreover, 
two meetings were organized ivith specialized programs of interest to groups 
with whom the Institute has not previously had much contact. A meeting 
in April at Columbia University, co-sponsored by the American Mathematical 
Society, was devoted to Stochastic Processes and Random Noise, and another 
meeting held simultaneously at Atlantic City was in conjunction with the meeting 
of the Eastern Psychological Association, It is clear that with such diversified 
meetings the Program Committee could not always act as a unit. J H. Curtiss 
was its Chairman and J. Neyman and J. W. Tukey arranged some of the pro¬ 
grams. Other members of the Committee were: C. W. Churchman, T. 
Koopmans, P. C. Mosteller, J. Neyman, H. Scheff^, J, Wolfowitz, and II. 
Working. 

At the Tenth Summer Meeting A. Wald delivered the first Henry L. Rietz 
Memorial Lecture. It is de,sirable to preserve the solemnity of the occasion 
of the Rietz lectures and it was therefore decided that they should not be given 
every year. Accordingly, no Rietz lecturer has been selected for 1948. 

The Institute had no share in the program of the International Statistical 
Congress in Washington. However, Fellows of the Institute were invited to 
that Congress. This Congmss and the Princeton Bi-Centennial were beneficial 
by establishing raoie intimate personal ties with our European colleagues. It is 
widely felt on both sides of the ocean that a closer cooperation, in particular 
with British statisticians, is highly desirable. Various suggestions in that 
direction were infoimally discussed in Washington and Princeton and M. G. 
Kendall has kindly consented to explore the practical possibilities. It is needless 
to say that the Institute is eager to do everything possible to promote cooperation 
and increase its usefulness also to our British colleagues. 

137 



138 


REPORT OP THE PRESIDENT 


Relaiions with olher organizahons It is gratifying to note t hat t hp pooporat ion 
of the Institute with sister societies is growing in intensity. The last, two Presi¬ 
dential reports mentioned plans for a reorganization of the American Stiili.stiral 
Association with a view'to more intimate rchitions among sUitistieal swietins, 
The revision of the constitution of the Association i.s not yet completed. It ap¬ 
pears now that also the American Mathematical Society feels the need of closer eo- 
laboration with all groups interested in applied mathematics. It is tCK> early t o 
predict the results of these movements but it is clear that we, must devote ean*fnl 
thought to our own organization and to our future relations with otiier groups. 

In 1947 the AAAS organized an Inter-Society Committee, for the National 
Science Foundation Legislation. At the first meeting in Washington we were 
represented hy J H. Curtiss and W. A. Shewhart and at the meeting in 1 iecember 
in Chicago by W. Bartky. In ballots on the two controvereial subjects the 
Institute voted against exclusion of social sciences and abstained on the f]ue.stion 
of patent rights. W. Feller represented the Institute on the Policy Connnitlee 
of the American Mathematical Society. Through this Committee llie Institute 
went on record as favoring the National Science Foundation Bill, t )lhe^^riKe 
the discussions of the Policy Committee were mostly connected with the es¬ 
tablishment of an International Mathematical Union. Clelim 0. Oakley rep¬ 
resented the Institute on the Publicity Committee of the American Mathematical 
Society of which he is chairman. G. W. Snedecor was our representalivo on the 
AAAS Council, W. Bartky on the National Research Council, F. C. Mofiteller 
and S S. Wilks on the Joint Committee for the Development of Statistical 
Application in Engineering and Manufacturing. In recent years the common 
interests of the Institute and the aciiiarial profession liavo grown in importance 
and it has been suggested that closer cooperation would be beneficial to both 
parts. A new committee has been established to explore these posmibililies 
and in particular to arrange a joint meeting during 1948. Members of this 
committee are: G. C Campbell, T. N. E. Grevillc, C. Fisher, (1. Spoerl, 
Chairman. 


Internal Work. The growth of the Institute has rendered parts of the Con¬ 
stitution obsolete and a revision seems indicated In particular, it appears that 
the present system of elections is no longer satisfactory. The Institute is deeply 
indebted to its Committee on Planning and Development wliich has devoted 
much thought and consideration not only to a revision of the Constitution but 
also to the future development of the Institute as a whole. The membemhip 
had occasion to discuss the preliminary plans at two busincBS mcotinwi. M. H. 
Hansen acted as Chairman of the Committee; other members were: J. H. Curtiss, 
Wallis W- Norton, F. F. Stephan, J. W. Tukoy, W. A. 

A sharp increase in printing costs has, unfoi-tunately, necessitated an inctease 
n membership dues However, the membership should rest assured that the 

fo?ms ^ T intrmsicaUy sound. The cash prospects 

or 1948 are not rosy, but this is due principally to the nece.ssity of reprinting 



EEPORT OF THE PRESIDENT 


139 


back-numbers of the Annals which in itseK is a sign of health and promise of 
stability At present the Institute has a considerable reseive in back numbers 
and this reserve is rapidly being transformed into cash. We are also exploring 
the possibilities of new revenue and have started a campaign to get advertise¬ 
ments for the Annals. A possible campaign for institutional members is held 
in abeyance pending a clarification of our formal relations with sister societies. 
In order to make the Annals available in European countries with monetary 
exchange restrictions, the dues and subscriptions have been increased only for 
the Western Hemisphere. The investments of the Institute have been super¬ 
vised by the Finance Committee consisting of C. F. Rods, L. A, Knowler, F. F. 
Stephan, and Paul S Dwyer, Chairman 

Last year’s Committee on Teaching completed its work and submitted a 
detailed report which ivill be of great value It will be published m the Annals 
of Maihemahcal Statistics. The Committee has been dissolved with special 
thanks of the Board of Directors for their successful work. H. Hotelling was 
chairman and its memliers were Walter Bartky, W Edwards Deming, Milton 
Friedman, and Paul Hoel The Committee on Tabulation under the chairman¬ 
ship of C. Eisenhart and consisting of Paul S Dwyer, H. Goldstine, A Lowan, 
H. W. Norton, and G R Stibitz has outlined the work for the coming years 
which promises to be of great interest 

The Membership Committee consisted of C. C. Craig, P G. Hoel, and J. H. 
Curtiss as Chairman. On its recommendations the following members were 
elected Fellows: T. W. Anderson, David Blackwell, Frederick Mosteller, Gerhard 
Tintner, Charles P. Winsor, Alexander Aitken, George Darmois, Ragnar Frisch, 
Robert C. Geaiy, and John Wishart. The Nominating Committee consisted of 
Meyer A. Girshick, Paul G. Ploel, Horace W Norton, Frederick Mosteller, 
and George W, Snedecor, Chairman A. Wald was nominated for President, 
and as an innovation four nominations for Vice-presidents were made: C. 
Eisenhart, A. M. Mood, Fleniy Scheffd, F F. Stephan. 

The Annals of Mathematical SiaUsUcs are covered by a special report of the 
Editor. However, it is appropriate to say that the Institute takes pride in the 
development of the Annals. While members see only its spectacular success, 
they should bear in mind that this is mostly due to the work of one man, S S. 
Wilks. In view of the great variety of interests of our membership and the 
many desirable directions in which the Annals could develop, it is clear that 
the work of the Editor can not always be pleasing and naturally often means a 
neiwoiis burden. I feel sure that I speak for all our members in expressing the 
Institute's sincere thanks to S. S. Wilks not only for his work but also for his 
wisdom in striking a sensible balance between many wishes and possibilities 
and leading the Annals so successfully in a direction satisfactory to all of us. 

In thanking all other members who have contributed to the work of the Insti¬ 
tute, it is hard to find appropriate words to express appreciation for the un¬ 
selfish efforts and devotion of our Secretary-Treasurei • Few members will 
realize how much of Dwyer’s time and thoughts are spent for the Institute 



140 


REPORT OF THE PRESIDENT 


and how much the smooth running of the affairs of the Institute is due to Ihs 
hard work. 

Finally, it is a pleasant duty to expi-eas our thanks and appreciation to Prince¬ 
ton University and to the University of Michigan. These Institutiorm have 
generously provided office space and other help which lias gi'Catly facilitated 
our work and saved us expenses. 

Will Feller, 
PmUcnl, 11147. 

December 31, 1947. 



REPORT OF THE SECRETARY-TREASURER OF THE 
INSTITUTE FOR 1947 


At the beginning of 1947 the Institute had 900 members and during 1947, 
210 new members (10 of which begin their membership with 1948) joined the 
Institute. During 1947 the Institute lost 73 members, 43 by resignation, 25 
by suspension for non-payment of dues, and 5 by death. The Institute has 
1,037 members as it starts 1948. 

The following members died during the year: 

Margaret J Dix 
Professor Irving Fisher 
Albert M Freeman 
Piofessor Henry A Huger 
Piofessoi James G Smith 

A summary of the financial transactions of the Institute is given in the Fi¬ 
nancial Statement for 19Ji7 which follows: 


FINANCIAL STATEMENT 

December 31,1946 to December 31, 1947 


A Receipts 

Balance on Hand,* December 31,1946 

$7,241 55 

Due.s . 

5,054,43 

Life Membership Payments . . 

. . 287 50 

Subscriptions ..... . . ,. 

2,892 93 

Sale op Back Numbers . . .... 

3,969.95 

Net Income prom Investments , . 

, ,,, 63 00 

Miscellaneous , , . 

76.56 

Total .... . 

$19,685 92 


B Expenditures 


Annals—C uKHENT 


Office of Editor, , .... 

$160 40 


Waveily Press , ... 

7,145.79 

$7,306.19 

Annals—Back Numbers 

Roprintcd 500 copies oacli Vol III jffl & 2; IV jj(2;V #2, VII 
))!l&4;Xn JK1,X1V J^l, 2&3 . 

3,039 00 


Iowa City Oflleo . . 

143 75 

3,182.76 

Mathematical Reviews and Inter-Society for National Sci- 

ence Foundation . ... .... . 

. 

136.00 


* In bank deposits and government bonds. 

141 



142 


REPORT OF SECRETARY-TRIXSUREU 


Office of the SECRBTAiiy-THEAstjREii 


Printing, memoranda, etc (including some stamped envelopes ) 
Postage, supplies, expiess, telephone calls and cables 

ClBrioal help . .... ... 

1,1(XI.49 

400.011 

1,502.31 

3,002.HO 

Miscellaneous, ... . . 

Balance on Hand,* Dbcembeb 31,1947 


IWl.Sl 

6,K'jH.37 

Total. . 


$19,84,6.92 

C. SuMMARV OF HeCBIPTS AND ExI'ENDITURKR 



Balance on Hand,* Decembeb 31, 1946. 

Receipts dubing 1947 . ... 

Expenditures during 1947.. . 

Balance on Hand,* December 31,1947. . 


$7,211.53 
12,3.14.37 
13,727.55 
5,8f« 37 


D. CoMPAMsoN OP Assets on December 31, 1946 and Dbcembbr 31, 1947 

( 

•is 


U S Government G Bonds . 

, $5,000.00 

$3,000.0(1 

Life Membership Funds,, 

1,888.00 

1,888.00-Bonds 


139,50 

427.00—Bank Dep. 

Additional Bank Deposits .... 

214.05 

6-13.37 

Current Accounts Receivable . , 

452.02 

423.55 

Estimated Value (Cost) of back issues of Aanafs** 

7,234.58 

10,800.73 

Total 

. $14,028.76 

$17,148,06 

Net Gain 1947. 

... 

.. 2,219.90 


E Liabiliubs of Institute op Mathematicae Statistics as of Dkckmubu 31, 1947 

All bills whicb have been presented have been paid. The Life Membership Fund now 
contains $2,316 00 which covers 30 members. Also $3,348.11 has been paid in (or 1948 
(and later) dues and subscriptions 


The increase in the size of the Anmls from 500 to OOO pages and the phe¬ 
nomenal activity in the sales of back numbers arc the two most important factors 
to be considered in comparing the 1947 statement with those of previous yeara. 
The Waverly Press bills for 1946 totalled $4,566.27 while the corresponding 
amount for 1947 was $7,145.79 an increase of 66%. The increase is attributable 
not only to the increased size of the Annals but also to the fact that printing 
costs are rising rapidly and, to a less extent, to the fact that we are printing a 
larger number of copies. It is to be noted that the cost of the Anmh alone in 
1947 was over $2,000 more than the amount received from dues. As a result 
of the increase in dues, the 1948 report should be more satisfactoiy in this respect. 

The phenomenal sales in back issues, noted in the report for 1940, were ac- 
ce erated m 1947. We sold nearly $4,000 of back issues. These extensive 
sa es were embarrassing to our cash position since they exhausted many of our 
issues and the contmued reprinting forced us to place a considerable portion of 


** Cost of Annals calculated at 67 cents per copy. 







bbpobt of secbetaey-teeasubee 


143 


our reserves in inventory (some of which probably will not be returned to cash 
within decades). Eleven issues were reprinted during the first six months of 
1947 The resulting low cash position forced a temporary change in the policy 
of reprinting issues as they became exhausted 

It was necessary to cash two $1000 interest bearing G bonds to meet the 
Waverly and reprinting bills as they came due. These brought $1938.00 rather 
than $2000 as they have been valued in previous reports. As the income from 
bonds during the year was $125,1 have entered the net income from investments 
as $63,00 

An attempt has been made to keep down the costs of the office of the Secretary- 
Treasurer. The expense for 1947 was about $100 more than the expense for 
1946 and seems very satisfactory in view of the larger membership and greatly 
increased costs of all materials and services. 

For the reasons indicated above, the cash position (including bonds and Life 
Mombeiship payments) was lowered during the year by $1,383.18. This is 
compensated for by an increase in the value of the stock of back issues (valued 
at cost) of $3,632.15. Some members of the finance committee feel that it is 
improper to list all of this stock as assets since we can probably sell only a portion 
of it in the next five or ten years. However, we did sell nearly $4,000 of Armais 
in 1947 and it is indicated (at the new prices) that the sales of issues we have 
now on hand will yield us $11,000 in the next five to ten years. 

Many of the issues which were stored in Iowa City have been sold and Pro¬ 
fessor ^^owler has sent the remaining issues to Ann Arbor. I wish to acknowl¬ 
edge the work of Professor Knowler in caring for these issues and to express the 
appreciation of the Institute for his efforts over a period of years. I also wish 
to express my appreciation to Mr. Carl Bennett who contributed much time 
and energy in looking after the back issues at Ann Arbor. 

This report does not cover the amount of $390,20 which is held temporarily 
by the Institute for the fund for Armais for Countries Devastated by War. 
Arrangements are being made to purchase Annals for certain institutions which 
the Committee is recommending. 

Paul S. Dwyee, 
Secretary-TreasuTer. 

December 31, 1947. 



REPORT OF THE EDITOR FOR 1947 


During the past year the increase in the number of raamiscripte bubniilted 
to the Annals has continued. More manuscripts have been received from 
foreign countries than in any preceding year, louring 1947 papers were pub¬ 
lished by authors in Argentina, Australia, Canada, England, Prance and Sweden. 
If manuscripts continue to be received at the present rale it will not bo pofwible 
to publish them in the Annals without further expansion. The gap between 
receipts of manuscripts and publication is likely to become serious by the end 
of 1948. The 1947 volume of the Annals contained 50 papers of which 25 were 
short notes. The total number of pages printed was G18, representing an 
increase of approximately 11 % over the size of the 1946 volume. It now appears 
that increased printing costs will prevent a further increase in the size of the 
Annals for 1948. It is therefore extremely important that authors submitting 
papers to the Annals make eveiy effort to keep their papers as brief as possible. 

Contributions to probability and statistical theory are continuing to come 
in from a wide variety of fields. They were written by biologists, C'h{'mi.sts, 
economists, mathematical statisticians, mathematicians and physicists, rep¬ 
resenting universities, government agencies and laboratories, Inisincss and 
industrial organizations. Some of these contributions are rather lictorogeneous 
in quality of results and presentation. However, patient attemiils are being 
made to have all papers with novel and interesting results suitably revised and 
published. Attempts to have expository papers prepared are being continued. 

The Editor wishes to take this opportunity to aclmowlodge, on behalf of tho 
Editorial Committee, the generous refereeing assistance which has been ^ven 
by the Mowing persons: L. A. Aroian, Z. W. Bimbaum, David Blackwell, 
A. H. Bowker, I. W. Burr, G. W. Brown, K. L. Chung, W. J. Dixon, T. N. E. 
Greville, F. E. Grubbs, J. B. S. Haldane, T. E. Hairis, C. Hastings, L. Henkin, 
G. A. Hunt, B. P. IGmball, T. Koopmans, S. Kullback, E. L. I^limann, H. 
Levene, H B. Mann, P. J. McCarthy, W. E. Milne, R. Otter, M. P. Peisakoff, 
H E. Robbins, L. J. Savage, F. F. Stephan, D. P. Votaw, and J. E. Walsh. 

he Editor is also indebted to the following pereons at Princeton University 
for preparation of manuscripts for the printer, and other editorial and ofBce 
assistance; Miss Jacqueline G. Foster, M. F. Freeman and ,T. E. Walsh. 


December 31, 1947. 


S. S. WiLKB, 
Editor. 


144 



CONSTITUTION AND BY-LAWS 
OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 


Constitution 

AETICLE I 
Name and Pijbpobe 

1 This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

AETICLE II 

Membebseip 

1. The membership of the Institute shall consist of Members, Fellows, Honorary- 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
members excepted, who have been members for twenty-three months prior to t]ie date 
of voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term 
as determined by the Committee on Membership and approved by the Board of Directors. 

AETICLE III 

Officers, Boarb of Directors, and Committee on Membership 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one 
year and that of the Secretary-Treasurer three years. Elections shall be by majority 
ballots at Annual Meetings of the Institute. Voting may he in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
indi-viduals present at the organization meeting, and shall serve until December 31,1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows. At their first meeting subsequent to the adoption of this Constitution, the 
Board of Directors shall elect three members as Fellows to serve as the Committee on 
Membership, one member of the Committee for a term of one year, another for a term 
of two years, and another for a term of three years. Thereafter the Board of Directors 
shall elect from among the Fellows one member annually at their first meeting after their 
election for a term of three years. The president shall designate one of the Vice-Presi- 
dents os Chairman of this Committee. 

AETICLE IV 
Meetings 

1, A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 

145 



146 


INSTITtJTE OF MATHEMATICAL STATISTICS 


time as the Board of Directors may designate. Additional meetings may \k called from 
time to time by the Board of Directors and shall he called at any time hy the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall 
be given to the membeiship by the Secretary-Treasurer at least tliirty days [irior to the 
date set for the meeting. All meetings except executive sessions shall lie oiien to the 
pubEo. Only papers accepted by a Program Comnuttee appointed by the President 
may be presented to the Institute 

2. The Board of Directom sliall hold a meeting immediately after their election and 
Again immediately before the expiration of their term. Other meetiufoi of the Board may 
be held from time to time at the call of the President or any two memlwra of the Board. 
Notice of each meeting of the Board, other than the two regular meetings, together with 
a statement of the business to be brought before the meeting, iuu.st lx: given to the mem¬ 
bers of the boaid by the Seoretaiy-Treasurcr at least five days prior to the date set there¬ 
for Should other business be pas.sed upon, any memlier of the Board shall have the 
light to leopen the question at the next meeting. 

3 Meetings of the Committee on Memliership may lie held from time to time at the 
call of the Chairman or any member of the Committee provided notice of such call and 
the purpose of the meeting is given to the meinbera of the Committee liy the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the liglit to reoiicii the question at the 
nexl; meeting Committee business may also bo transacted by correspondence if that 
seems preferable. 

4 At a regulaily convened meeting of the Boaid of Director, four niemtiers sliall 
constitute a quorum. At a regularly convened meeting of the Clomiiiittee on Membet- 
sMp, two members shall constitute a quorum. 

ARTICLE V 

POBUOATIONS 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute, 
The Editor of the Annals of Mathematical SbAialics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated 
at the discretion of the Board of Directors. 

2. Other publications may be origmated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be exiielled or suspended except by 
action of the Board of Direotois with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-tlurds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 



BY-LAWS 


147 


By-laws 

ARTICLE I 

Duties of the Oppicers, the Editor, Board op Directors, and 
Committee on Membership 

1 The President, or in his absence, one of the Vice-Presidents, or in ths absence of, 
the President and both Vice-Piesidents, a Fellow selected by vote of the Fellows present 
shall preside at the meetings of the Institute and of the Board of Directors, At meetings 
of the Institute, the presidmg officer shall vote only in the , case of a tie, but at meetings, 
of the Board of Directors he may vote in all cases. At least three months before the date, 
of the annual meeting, the President shall appomt a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nommations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
votmg members at least thirty days before the annual meeting. Additional nominations 
may be submitted in writing, if signed by at least ten Fellows of the Institute, up to the 
time of the meeting. 

2 The Secretary-Treasurer shall keep a full and accurate record of the pioceedings at 
the meetings of the Institute and of the Board of Directors, send out calls for said meet¬ 
ings and, with the approval of the President and the Board, carry on the conespondence 
of the Institute. Subject to the direction of the Board, he shall have charge of the ar¬ 
chives and other tangible and intangible property of the Institute and upon the direction 
of the Board he shall publish in the Anmls of Mathematical Statistics a classified list of all 
Members and Fellows of the Institute. He shall send out calls for annual dues and ac¬ 
knowledge receipt of same, pay all bills approved by the President for expenditures 
authorized by the Board oi the Institute; keep a detailed account of all receipts and ex¬ 
penditures, prepare a financial statement at the end of each year and present an abstract 
of the same at tlie annual meeting of the Institute after it has been audited by a Member 
or Fellow of the Institute appointed by the President as Auditor. The Auditors shall 
report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi¬ 
bility for all editorial matteis concerning the editing of the Annals of Mathematical Sta¬ 
tistics. He shall, with the advice and consent of the Board, appoint an Editorial Com¬ 
mittee of not less than twelve members to co-operate with him; four for a period of five 
years, four for a period of three yeais, and the remaining members for a period of two 
years, appointments to be made annually as needed. All appointments to the Editorial 
Committee shall terminate with the appointment of a new Editor. The Editor shall 
serve as editorial adviser in the publication of all scientific monographs and pamphlets 
authorized by the Board 

4. The Board of Directors shall have charge of the funds and of the affairs of the In¬ 
stitute, with the exception of those affairs specifically assigned to the President or to the 
Committee on Membership. The Board shall have authority to fill all vacancies ad 
interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time to 
carry on the affairs of the Institute The power of election to the different grades of 
Membership, except the gmdes of Member and Junior Member, shall reside in the Board. 

6. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 



148 


INSTirUTB OF MATBKiMATICAL Sl'ATI.STIC.S 


<iifferent grades of memberslup. The Coirunittee shall i ovievv tlieso (lualifiiifitioiift iKiricHli- 
eally and shall make such changes in these qualifications and make such recommendations 
with reference to the number of grades of membership us it deems advisable. The power 
to elect worthy applicants to the grades of Member and Junior Memlsjr sludl reside in 
the Committee, which may delegate this power to the Recretary-Treasurer, subject to 
such i-eservations as the Committee considers appropriate. The Comniitteti shall make 
recommendations to the Board of Directors witli reference to placing members in ntlier 
grades of membership. The Committee shall give ilB attention to the quMtion of in¬ 
creasing the number of applicants for membership and sliali advise tlie flecrctary-Trea¬ 
surer on plans for that purpose. 


ARTICLE II 


Dues 


1. Members shall pay seven.dollars at the time of admission to memlwrship and hliall 
receive the full current vol ume of the Official Journal. Tliereutter, Memtiors and Fellows 
shall pay ^yen dollars annual dues. Honoraiy membere shall lie o.'tempt from all dues. 

A Sustaimng Member shall pay annual dues of a multiple of one hundred dollars. 

An approved nominee of a Sustaining Member shall be a member in good standing 
without payment of dues for each year m which lie is nnnunated provided that in that 
year he has been a member for less than tlirco ycais 

(a) Exception. In the case that two Members of tlie Institute are husband and wife 
and,they elect to receive between them only one copy of tlie Official Journal, llielr dues 
slialL each be reduced by twenty-five per cent. 

(b) Exception. Any Member or Fellow may make a single jiuynieut whicli will bo 
accepted by the Institute m place of all succeeding annual dues and wliicli will not otlicr- 
wise altei his status as a Member or Fellow and will be liased upon a suitalilc table and 
rate of interest, to be specified by the Board of Directors. 

(c) Exception, Any Member of Fellow of the Institute serving, except ua a commis¬ 
sioned officer, in the Armed Forces of the United States, or of a friendly power, will, upon 
notification to the Secretary-Treasurer, be excused from the payment of dues until the 
January first followmg his discharge from service or his commissioning as an officer. He 
Aall have all privileges of membership except that he shall not rcceivo the Official Journal. 
However, during the first year of his resumed membership ho may elect to receive one 
copy of each volume of the Official Journal published during the period of Ills service 
membership by paying one-half of the total of dues excused. 

(d) , Exception, Anyone who resides outside the Western Hemisphere shall pay fivm 
dollars annual dues. 


2, ^nual dues shall be payable on the fiist day of January of cadi year. 

1 shall bo for a subscrip¬ 

tion to the Official Journal Fifteen dolinre of the dues of each Sustaining Momlier shall 
be for two subscriptions to the Official Journal, and the binding of one copy. 

fn n Sustaining Moral)cr shall Im entitled 

to nominate two persons for membersliip in the Institute. 

mav be skSocrctary-Treasuier to notify by mail anyone wliose dues 
y SIX months in arienrs, and to accompany such a notice by a copy of this article. 

no fee date'of mai^ng smS; 

notice, the Secretary-Treasurer shall report the delinquent to the Board of DirLors. 



by-laws 


149 


The Board of Directors may strike the delinquent’s name from the rolls and withdraw 
all privileges of membership, and may reinstate the delinquent upon payment of arieais 
of dues. 


ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee 


ARTICLE IV 

AnrENDMENTS 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors 




THE ANNALS 
of 

MATHEMATICAL 
STATISTICS / 

Tbb OvncEAii JouBiuxi cv *riBnB Instvnnnn 

cw MAiBBtuigroAit Svauxbucb ‘ 

Omieniji "y 

DisBrimimint Rnietions with Ckmuianee, W. G. Ckwinwa* and C I * 
Burn.. ' . 

^ - V- - • * *.... 

On the Kolnugorav'-Siniraov Limit Thcomns fat Emnirieal Distii- 
iMitkma, W. FKU.IW....;.... _ 3 

Application of Heemtmt Scries in Benemil Theory, Ju^am J. 
LolTKA ........; _ ^ ^ ’ 3 

Sotution ctf l^iuationa fay Inteitmiatian, W. M. Kinc&io. ......[[;,i 

Kstimation of a Pwahirtfir when the Nurnfafsp of Utdoomi 

isteni InoratKXM ladolfaiite^ wKh tho Kumber of Obashrottana 
AmuHAX Walb __ . ' '■ 


'ITw! FlHctorial Apiiraach to the W 
Multi^e SampImK for VitriAbha^ 
<)» this Funodohai 

Mghoai; Prot^eni, ^ 

Su-J SiiMm- r • . :■? 

V'ariicNjs Gelations in sion 

htl i *'*_*•'» a r . .‘.r . . / * P 4 . 

|de» &oqi<"'A JfhttpBl Pdp^h^on/ ,■ 

A. yun«ti^iad KilOAtipil: for WM 
. iTte DtAtvibatiiKiMit a 

iwtiV 

., ' |io%esMoinei^'laff' 














DISCRIMINANT FUNCTIONS WITH COVARIANCE 
Ry W. G. CoCHIiAN AND C. I RurR 

i\lorlh Carolina Stala College; Canncclicut Agricultural Experiment Station and 

Vale Uninersity 

1. Summary. This paper discusses the extension of the discriminant func¬ 
tion to the case where certain variates (called the covariance variates) are known 
to have the same means in all populations. Although such variates have no 
discriminating power by themselves, they may still be utilized in the discriminant 
function. 

The first step is to adjust the discriminators by means of their ‘within-sample’ 
regressions on the covariance variates. The disci iminant function is then 
calculated in the usual way from these adjusted variates. The standard tests of 
significance for tlie dReriminant function (e.g. Hotelling’s test) can be ex¬ 
tended to this case without difficulty. A measure is suggested of the gain in 
information due to covariance and the computations are illustrated by a numeri¬ 
cal example. The di.sciission is confined to the case where only a single function 
of the population mcan.s is being investigated. 

2. Introduction. Discriminant function analysis is now fairly well advanced 
for the case where there are only two populations. The data consist of a number 
ofmeasurernenta, called lha discrinmatora, that have been made on each member 
of a random asimple from each population. The technique has various uses, 
Fisher [Ij used it in seeking a linear function of the measurements that could be 
employed to classify new observations into one or other of the two populations. 
Ho pointed out (2| that a test of significance of the difference between the two 
samples, developed from his discriminant, was identical with Hotelling’s generali¬ 
zation of vStudent'a I test, discovered some years earlier [3]. Mahalanobis’ con¬ 
cept of the generalized di,slance between two populations [4] was also found to 
be closely related to the discriminant function. In any of these applications— 
to classification, tc.sting significance, or estimating distance—we may also be 
interested in considering whether certain of the measurements really contribute 
anything to the inu-po.so at hand, and helpful tests of significance are available 
for this purpose. 

Recently the authors oncountorod a problem in which it seemed advisable to 
combine discriminant fiinetion analysis with the analysis of covariance. This 
case occure wlmnevor, in addition to the discriminators, there is a measurement 
whosio mean is known to bo the same in both populations. Suppose, for example, 
that the I.Q.’s of each of a sample of students are measured. The sample is 
then divided al ramkm into two groups, each of which subsequently receives a 
different type of training. Measurements made at the end of the period of train¬ 
ing would be potential discriminators, but in the case of the initial I.Q.’s we can 

161 



162 


W. Q. COCHRAN AND C. I. BLISS 


clearly assume that there is no difference in the means of the populations cor¬ 
responding to the two groups. 

The initial I.Q. measurements are of course of no use in themselves in studying 
differences introduced hy the training. Nevertheless, if they are correlated with 
the discriminators, they may serve in some way to ‘improve’ the discriminant: 
e.g. to increase the power of Hotelling’s test, or to reduce the number of errors 
in classification. This paper discusses the problem of utilizing such measure¬ 
ments, which will be called covariance variates. The problem is analogous 
to that which is solved by the analysis of covariance. In covariance, as applied 
for instance in a controlled experiment, variates that are unaffected by the 
experimental treatments can be used to provide more accurate estimates of the 
effects of the treatments or to increase the power of the F test of the differences 
among the treatment means. 

The procedure suggested is as follows. First, the multiple regression is ob¬ 
tained of each discriminator on all the covariance variates. These regrcssion.s 
are calculated from the Svithin-sample’ sums of squares and products: that is, 
from the sums of squares and products of deviations of the individual measure¬ 
ments from their sample means. Each discriminator is then replaced by its 
deviations from the multiple regression, and a now discriminant function is 
calculated m the usual way from these deviations. The extensions of Hotelling’s 
T* and Mahalanobis’ distance arc both obtained from this discriminant, though a 
further adjustment factor is needed for te.stH of significance. 

This paper is arranged in three parts, Part I presents a numerical example. 
The decision to place the example first was taken because most of the actual 
applications of the discriminant function in the literature appear to have been 
made by persons relatively unfamiliar with the theory of multivariate analysis. 
It is hoped that with the aid of the example readers in this class may Im able to 
utilize covariance variates. For the same reason, the calculations have been 
presented as far as possible in terms of the operations of ordinary multiple re¬ 
gression, rather than in the form in which they fimt emerge from the theory. 
Actually, various equivalent methods of calculation are available, and it is not 
claimed that our method is necessarily the best. A mathematical statistician 
may prefer to follow the computing methods which come directly from theory 
(Part II, section 13). 

The example is more complex in structure than the two-sample case. The 
data constitute a two-way classification, in which the row means are nuisance 
parameters, being of no interest, while only a single linear function of the column 
means is of interest. It is well loiown that the ordinary t test can be applied 
not only to the difference between two sample means, but to any linear function 
of a number of 'sample means in data that are quite complex. Discriminant 
function technique can be extended in the same way, and readers familiar with 
the analysis of variance should find no great difficulty in making the appropriate 
extension to such data, 

Part II presents the theory. The reader who is primarily interested in theory 



DISCRIMINANT ffONCTIONS 


153 


should read Part II before Part I Since the approaches used by Mahalanobis, 
Hotelling and Fisher all converge, we have chosen that of Mahalanobis, mainly 
because the extension of his techniques to include covaiiance variates seems 
straightforward. Maximum likelihood estimation of the generalized distance 
is presented in full for the two-population case. The frequency distribution of 
the estimated distance and the extension of the test are worked out. An 
attempt is also made to obtain a quantity that will measure what has been gained 
by the use of covariance. 

In order to illustrate how the theoi'y applies with other types of data, the 
mathematical model is given for the row by column cla.s,sification that occurs in 
the e.xafnple. The major results for this model are indicated, though without 
proof. 

In Part III it is shown that the computational methods used in the example 
are equivalent to those developed by theory. While this can easily be verified 
in a particular case, it is not intuitively obvious. 

PART I Numerical Example 

3. Description. The data form part of an expermient on the assay of insulin 
of which other parts have been published [5]. Twelve rabbits were used. 
Each rabbit received in succession four doses of insulin, equally spaced on a log. 
scale. An interval of eight days or more elapsed between successive doses, and 
the order in which the doses were given to any rabbit was determined by random¬ 
ization. Thus the experiment is of the ‘randomized blocks’ type, where each 
rabbit constitutes a block and there are 12 blocks with 4 treatments each. 

The effect of insulin is usually measured by some function of the blood sugar 
of the rabbit in periodic bleedings after injection of the insulin. The blood sugar 
was measured for each rabbit at 1,2, 3, 4, and 5 hours after injection, and also 
before injection. In order to simplify the arithmetic, only the initial blood sugar 
and the blood sugars at 3 and 4 hours after injection will be considered here. 
These data are shown for the first three rabbits (with totals for all 12 rabbit') 
in Table I. 

Let Xiwi be a typical observation of blood sugar, where i = 3, 4 stands for the 
hour after injection, w for the rabbit and d for the dose. The mathematical 
model to be used is as follows. 

(1) Xiui ~ fii ~i~ pim d” 7it "P ~ XO’-) d" • 

The parameters m , piw and yu represent the true moan and the effects of rabbit 
and log dose respectively. The quantity xoat is the initial blood sugar for the 
rabbit w before the test at dose z, while xo . is the average initial blood sugar over 
the whole experiment. The blood sugar at i hours has been found experimentally 
to be correlated with the corresponding initial blood sugar, and the relationship 
is represented here as a linear regression, with Pto as the regression coefficient. 
The residuals are assumed to follow a multivariate (in this case bivariate) 
normal distribution, with zero means. The covariance between and 



16 i 


W. G. COCIIltAN AND C. I. tBIAHS 


is taken as cnpo ■ The model is the standard one for the ordinary unalysia of 
covariance, except that we have iwo measures of the effect of insulin, .tj and Xi. 

One additional assumption was made. For all post-injection readings, the 
blood sugar seemed linearly related to the log dose h . Since this result luia been 
found in other expenments, we assumed that 


7w - S,i, 

where 5, is the regression coefficient of blood sugar on log dose. 

4. Object of the analysis. Our object was to find the linear combination of 
the three blood sugar readings that would measure best the effect of the insulin. 
Because of the linearity of the regression on log dose, the effect of insulin on each 


TABLE 1 

Sample of original data on hlood sugar levels in insulin experiment 


Rabbit 

No. 


Total*. 


Log doeo 


Initial blood sugar xn 

Thrao hours 

.32 

47 

.62 

77 

.32 

.47 

02 

76 

94 

107 

.94 

95 

76 

07 

91 

80 

83 

93 

98 

90 

77 

97 

99 

90 

91 

_ 

84 

70 

59 

1065 

1074 

112l|l070 

932 

872 

731 


.77 

60 

Of) 

48 


Four hours Xi 

.'Vi I .47 ' ,C2 ‘ ~77 

f)0i f)6, 115 91 

104j 87i 9()| 89 
1)3 102' 85i 90 


*12 rabbits. 


Xi IS Imown completely it the slope 5, is Imown. It seems reasonable to choose 
e hneor compound of the 3;.’s which tvill give the maximum ratio when its 
estimated regression on log dose is divided by the estimated standard error of 
tks regression We now consider how to obtain this maximum. The argument 
given below is not intended to prove the validity of the method, for which refer¬ 
ence should be made to Part II. 

The true regression of the original blood sugar xo on log dose is Itnoivn to be 

“ it enables 

estimate r T'! to 

stdl^Lfor f rabbil!. Prom the 

8 andard theory of covariance the best estimate is the regression coefficiGnt 

10 ,o/Aoowhere E denotes a sum of squares or products calculated from 

moZ'/s covariance, that is from the sums of squares and 



DISCRIMINANT FUNCTIONS 


156 


The regression of the blood sugar at each hour on the log dose of insulin is 
calculated from totals adjusted for the regression on xo. Since the 4 successive 
log doses (z = 1, 2, 3, 4) are spaced equally, they may be replaced in the com- • 
putation by the coded doses —3, —1, +1, and 4-3. If we let T,i be the total 
blood sugar, summed over 12 rabbits, at the fth hour with dose z, the following 
result is well Imoivn for the analysis of covariance. The best estimate of 
Si(i = 3, 4) is 

[(-STa - T ,2 -h -h 32\,) - 6,0 (-Sfoi - ^2 + To 3 + 3roi)]/240 
The divisor, 240, is 12(3“ 4- l“ -t- l“ -h 3“). The expression may be written 

d', _ (d, — 6,0 do) 

m 240 ’ 

where 


di — —3Tti ~ Tii 4" T ,3 4" 37^,4. 

A linear combination is formed from da and d 4 , the numeratoi'S in the best 
estimates of 63 and , by means of the coefficients I /3 and Li . L 3 and Li are 
computed so as to maximize the ratio of 

dj = hjda 4- L^di 

to its estimated standard error. 

From the definition of dj, this requires a discriminant of the foi-m 

I — ■ib3(T3ui« 6joTou(r) 4“ Li(Xi^x — hioX(jyfz)j 

where each is measured from its mean. 

We require next the estimated standard error of di, This depends, in turn 
upon the variances of di and d[ and their covariance. As usual in the analysis, 
of variance we have 

(5) 7(d;) = l’(d 3 ) 4- d„“ 7 ( 630 ) = ^133.0 (^240 -f ^ ■ 


The residual variance 0 - 33,0 is estimated from the sums of squares and products 
in the error row of the analysis of covariance as 

Sai.o = ®a3 0 /^ = (-^33 ~ Eso/Uoo)/'!!, 

where n is the degrees of freedom in each E„ diminished by one. Similar methods 
lead to the variance of d 4 and to the covariance of di and di . It follows that the 
Ime variance of d, may be written 


(3) 


7(d/) cx La 0 - 33.0 4" 2 I 13 L 4 va 4 .o 4“ L4 0 - 44 , 0 , 


where the factor (240 4- f - ) hi equation (5) is omitted since it does not involve 
\ Aqo/ 



156 


W. G. COCHKAN AJTD C. I. HLIS8 


tte L’s. Similarly, the estimated variance of d,, apart from constant factors, 
may be written aa 

(7) LsE-jj.a -J- 2LiL^E3iji -f- 7j4/!f4i,(3 > 

The quantity to be maximized is therefore 

_ (T'adj + Lidf) 

V: ' 1. ■\-2i. L-h. 

Formally, this is the same typo of quantity that is maximized in ordinary analysis 
with the discriminant function. Differentiation with respect to the L*k loads 
to the equations (after omission of another constant factor) 

(8) ■E^aa.oT'a + = di , Esi^Lis -f- Euj>Ei — d'i . 

The objective of the computation, therefore, is to obtain discriminant coefficients 
having the same ratio to each other as Lj and Lt in equations (R). As will be 
shown in the next section, this can be accomplished in practice more conveniently 
by substituting an alternative set of three simultaneous equations for the two 
in equations (8). 


5. Calculations. The first step us to fonn the sums of sciiiare.s and prcxhicts 
in the analysis of covariance. With 12 rabbits and 4 doses, the con\'onlional 
breakdown of each total sum is into components for rabbits (11 d.f.), doses 
(3 d.f.) and rabbits X doses (33 d.f.). Because of the aasumed linear regression 
on log dose, the sum of squares for doses was further divided into two com¬ 
ponents The first (1 d.f.) is the contribution due to this rogrc.ssion. For ajj, 
the sum of squares due to regression is d?/210. or in the case of x-,, (UGl)'724o’ 
or 5646. The remaining component, (2 d.f.) is called the cureaiurO) since it 
measures the effect of deviations from the linear regression. The sum of squares 
for curvature is found by subtraction. 

The following points may be noted, (i) For both x, and a:,, the F ratio of the 
curvature mean square to the rabbits X doses mean square ivill be found to be 
less than 1, so that the data do not suggest rejection of the hypothesis of a linear 
regression on log dose, (u) The F ratios of the regression mean squares to the 
rabbits X doses mean squares are highly significant, being 57.8 for x, and 28.7 
or zi. ^ his indicates, incidentally, that the three-hour reading may be a more 
responsive measure of the effect of msulin than the four-hour reading, (iii) With 

%, the 7 ratio does not approach signiffcanco for either the rogrosaion or the 
curvature, as is to be expected. 

assumption of linear regression on log doso is that the 
products are estimates of tho same quantities aa 
tne rabbits X doses mean squares and products. Oonseauentlv the lines for 

error sums of squares or products, J,,, etc. We decided, however, to estimate 



DISCEIMINANT FUNCTIONS 


157 


the error only from the 33 d f. for rabbits X doses. This was done because it 
seemed to facilitate a test of the curvature of the final discriminant I. (This 
test will not be reported here.) 

The L's could now be obtained from equations ( 8 ). In this case the first 
equation would contain the terms 

= 3223 - (1259)V2351; Em a = 1200 - (1259) (1340)/2361; 

4 = da - ho do = -1164 - ^ 62, 

leading to the simultaneous equations 

2548.8 Lad- 482.4 L 4 = -1197.2 
482.4 h + 2373.2 = - 844.3, 

which give VT.. = -.41848/-.27070 = 1.5459. 

TABLE 2 


Sums of squares and products 


Componenl. 

(1 f 

1 2 
to 

x'i 

1 

1 

toTj 

1 

1 TjT4 

Between rabbits . ., 


886 

9376 

11165 

1952 

2477 

9206 

Between doses. 


168 

5806 

2810 

-247 

-98 

3981 

(Reg. on log dose. . 


16 

5645 

2727 

-301 

-209 

3924 

(Curvature . 


152 

161 

83 

54 

111 

57 

Rabbits X doses. 


2361 

3223 

3137 

1259 

1340 

1200 

Total. 

47 

3405 

18405 

17112 

1 

2964 

1 

3719 

14387 


Instead of using these equations, we propose to solve alternatively the set of 
three equations 

SooEo "h S03L0 SoiLi = do ’ 

(9) SaoLo "h S 33 L 3 + iSsiL^ = do 

S40L0 + SisLo + S44L1 = di, 

where each Stj {i = 0, 3, 4) is the sum of squares or products formed by adding 
the error line in the analysis of variance to the line for regression on log dose. 
Thus S,/has 34 d.f. The ratio of L3 to L|, as found from equations (9), is exactly 
the same as that found from the original equations ( 8 ), as is proved in section 18. 
Further, the solution of the new equations seems to be more useful for performing 
tests of significance, as will appear in following sections 
Acdordingly, the first step after forming the analysis of variance is to set up 
the three equations (9). 




158 


W. G. COCHRAN -VND C. I. DLTSH 


The equations are solved by means of the invorae matrix. The values of di 
on the right side of the equations are replaced successively by 1 , 0 , 0 by 0 , 1 , 0 
and by 0 , 0,1 to obtain the three sets of values for Lo, ia and Li . Tiiese results 
are given in the first three columns of Table 4 and arc designated as C(,-. 

The L’s follow from the dj by the usual rule for regressions. For example, 

Ls = {(.003209)(62) + (.227781)(-1164) + (-.199655)(-809)1-10“’ « 

-.103417 


TABLE 3 

Equalions for determimng La and Li 


2367Lo + 958 L 3 + II 3 IL 4 = 62 

958Lo + 8868 L 3 + 5124Li = -1104 
II 3 IL 0 + 5124La + 58G4L4 = -809 


The composite response or discriminant, adjusted for the covariance variate, is 
now taken as 


or 


T / Fjo 

= Li la:, - j- 

\ iSitlO 


Xo 




-.103417 


/ 1259 

Wl 


.066883 



1340 \ 
23M 


= .093503x0 - .103417X3 - .000883x4. 

Note that the value of Lo is not used at this stage and that L 3 /Li - 1.540 agrees 
with the value found from equations ( 8 ). 


TABLE 4 


Inverse matrix (X 10®) and L’s 



( 10 »c,i) 


.465408 

.003209 

-.092568 

.003209 

.227781 

-.199665 

- .092568 

-.199665 

.362846 


cli 


62 

-1104 

-809 



III 


.100008 
-.103417 
- .000883 


A similar method may be followed when there are more discriminators or more 
covariance variates. With two covariance variates, xq and x,, for mstonce, tho 
adjusted discriminant would be 


Li{x3 630X0 — 630X0) + Li{xi — 640X0 — 640X0) 

where 630 , 6 jo are the partial regression coefficients of x, on xo, .To respectively, 
determined from the error line, and similarly for X 4 . Further, since any linear 




DISCHIMINANT FUNCTIONS 


169 


function of the column (dose) means may bo represented as a regression on some 
variate t,, this method may be applied to any linear function of the column means 
in which we are interested, provided that the mathematical model is appropriate. 

6 . Test of the regression of the adjusted discriminant on log dose. The 
numerator of the regression of I on the coded doses is 

di = L 3 {d 3 — hsuda) Li{di — 640 ^ 0)1 

Since the regressions of Xi and xt on the coded doses were both significant, it 
may be confidently expected that the regression of I will also be significant. The 
test of significance will, however, be given in case it may be useful in other appli¬ 
cations. For those who are familiar with multiple regression, the teat is perhaps 
most easily made by means of a device due to Fisher [2]. 

Construct a dummy variate such that y^z is always equal to t, , or in our 
case to the coded doses. That is, y takes the value —3 for all observations at 


TABLE 5 

Analym of y^ and yx^ 



d f. 

y“ 

yxi 

Rabbits . . 

11 

0 

0 

Doses . 

3 

240 

d, 

Regression on log dose. . . 

1 

240 

d, 

Curvature. 

2 

0 

0 

Rabbits X doses = error , . . , 

33 

0 

0 

Sum = Error plus reg. on log dose 

34 

240 

di 

Total. . 

47 

240 

di 


the lowest dose level, and —1, +1, and -|-3 respectively for all observations at 
the successive higher dosage levels. We shall show that equations (9) solved 
in finding the L’s are formally the same as a set of normal equations for the hnear 
regression of y on. xa, X 3 , and Xi . 

The following analysis for y^ and yxi may easily be verified, 

It will be noted that the sum of products of y and X{ in the sum line is d,. 
Further, Si/ is the sura of products of x, and x, for this line. It follows that tho 
normal equations for the regression of y on then’s, as calculated from the “sum" 
line, are 

SioLa +■ StaLi -)- SiiLi = d, (i = 0, 3, 4). 

These are just the equations solved in obtaining the L'a. Consequently, Lj 
and Li are the partial regression coefficients of y on Xi and Xi. A test of the null 



160 


■W, <J. COCHRAN AND C. I. BLISS 


hypothesis that the true values of and Li are both zero can bo made by the 
standard method for multiple regression, as -will be shown later from theory. 
This test is equivalent to a test of the hypothesis that the true value of dj is zero. 

To apply the test, we require three items in the analysi.s of variance of y. 
First, the total sum of squares for the Sura line, already seen to be 210 (Table 5). 
Second, the reduction due to a regression on all variates (covariance variates 
plus disenmmators). By the usual rules for rogreasion, this is (from Table 4) 

Lodo + iada + Lidi = (.100008)(62) + (-.1034l7)(" 11(51) 

+ (-.0G6883)(--8()9) » 180.69. 

Finally, we need the reduction due to a regression on the variates that are not 
being tested, i e. on the covariance variates alone. From Table 4, the reduction 

TABLE 0 


Analysis of variance of dummy variate y 



(I. f. i 

8 . S. 

M. R. 

Reduction to regression on covariance variates. 
Additional reduction to regression on dis-' 

1 

1 

' I 

1.62 


criminators.' 

2 

179.07 

80.51 

1.913 

1 

Deviations . 

31 

59.31 

Total (from Sum line) . 

1 1 

I 34 i 

240.00 

1 " ^ 

1 


in this case is simply dl/Soo or (62)V2367, or 1.62. The difference, 180,09 — 1.62, 
represents the reduction due to the regression of y on Lj and Li , after fitting so. 
The resultmg analysis is given below, the degrees of freedom being apportioned 
by the usual rules. 

The F ratio, 89,54/1 913, or 46.80, with 2 and 31 d.f., is used to teat the null 
hypothesis that the adjusted discriminant has no real regression on log tln.so. 

7. Test of particular discriminators. Another useful test is that of the, null 
hypothesis that a particular discriminator, or group of discriminators, contribute 
nothing to the adjusted discriminant. In other words, this is a tost of the null 
hypothesis that the true values of a subset of the L’s arc all zero. The test is of 
.interest in the present investigation, since it would be useful to laiow whether 
all five hourly readings of the blood sugar are really helpful. As might bo 
expected by analogy with the previous section, the test is made by calculating the 
additional reduction due to the regression of y on the particular subset of the 
L’s in question. 

The test will be illustrated with respect to Li . One method of makmg the 
test is to re-solve the normal equations with Li omitted. From this solution 





DISCRIMINANT FUNCTIONS 


161 


the reduction due to a regression of y on Xo and Xi alone is obtained. The addi¬ 
tional reduction due to a regression on Xt is found by subtraction from 180 69, 
However, the additional reduction can be found directly from the well-loiown 
regression theorem that it is equal to L^/ca . The c's have already been found 
in Table 4. The result is (,066883)VC000362846), or 12.33 This value is 
tested against the residual error mean square of 1.913, F having 1 and 31 d.f. 
The contribution is found to be significant. 

In fact, by this process a kmd of estimated standard error can be attached to 
each of the L’s for the discriminators, using the formula s\/^ , where s is the, 
residual root mean square. Thus for La, (—.103417), the ‘standard error’ is 
V(1.913)( 00022'f781), or .0209. It should be stressed that at this point the 
analogy with regression is rather thin. The L’s are not normally distributed, 
nor do the estimated standard errors follow their usual distribution. It is, 
however, correct that if the true value of Lt is zero, Li/ss/^i follows the t distri¬ 
bution with 31 d.f. Thus, if omission of some discriminators seems warranted, 

TABLE 7 


Analysis of variance for regression of y on the discriminaiors 



d f, 

S S. 

M, S 

Regi’ession. 

2 


79 60 

Deviations. 

32 

IBI 

2,625 

Total.. .... 

34 

240.00 



these t ratios are relevant in deciding which variate to eliminate first. Strictly 
speaking, the c’s should be re-calculated after each elimination before deciding 
which other discriminators might also be discarded. 

8 . Estimation of the gain due to covariance. The tests given above enable us 
to state whether the discriminators contribute significantly, in the statistical 
sense. It is also of interest to investigate what has been gained by the use of 
the covariance variates. From the practical point of view, the question: "What 
is the gain from covariance?” might be re-phrased as: "If xo is ignored, how 
many rabbits must be tested in order to estimate the regression on log dose as 
accurately as it was estimated with the adjusted discriminant for 12 rabbits?” 

The theoretical aspects of the question are discussed in section 16; the calcula¬ 
tions are described hero. The only new quantity needed is the F ratio for the 
regression of y on the discriminators alone. This can be obtained by a new 
solution of the normal equations, this time with the covariance variates omitted. 
With just one covariance variate, it is quicker to use the fact that the additional 
reduction to the regression of y on so, after fitting and Xi , is Lo/coo, 
or (.100008)V(-000465408) or 21.49. Consequently, the reduction due to a 







162 


W. G. COCnBAN AND C. I. BUSS 


regression of 7 / on xj and X 4 alone is 180.69 -■ 21.49, or 159.20. The F ratio, 
79 60/2.525, is 31.52, whereas the F ratio with covariance is 40,80 (from 
Table 6). The quantity suggested from theory for comparing the two techniques 
is 

(.n2-2)F _ ^ 

Th 

where n, is the number of d.f. in the denominator of F. These valuers are 
,((30 X 31.52/32) - 1) or 28.55 with no covariance and {(29 X 40.80/31) — 1) 
or 42.78 with covariance. The relative information is estimated as 42.78/28.55, 
or 1.50, so that the use of covariance gives 50 per cent more information. In 
other words about 18 rabbits would be needed if the initial blood sugars wore 
ignored. To a slight extent this estimate favors the covariance analysis, since 
it ignores the mcreased accuracy that would accrue from the extra error d.f. if 
18 rabbits were used without covariance. 

PART II Theoky 

9. Notation. The theoiy will bo given first for the two-population case- 
We suppose that a random sample of size N has been drawn from each popula¬ 
tion. A typical discriminator is written Xi^a and a typical covariance variate 
xiva , where 

= ■ ■ p denote discriminatom, 

S, 7) = 1, 2, • • - Jo denote covariance variates, 
w ~ 1,2 denotes the population, and 
a = 1, 2, • • ■ fV denotes the order within the sample. 

The population mean of x,u,a is , and the corresponding sample mean is . 
The difference (m (2 — mu) is .denoted by 5j and the corre.sponding estimated 
difference (a-’jj, — x,i.) by d,. 

10. Discriminant functions and generalized distance. Since we propose to 
approach the theory by means of the generalized distance, it may be well to 
review briefly the relation between the discriminant and tho generalized distance. 
In the ordinary theory (with no covariance variates) it is assumed tliat the var¬ 
iates a:„„„ follow a multivariate normal distribution, and that the covariance 
matrix ffj, between Xiwa and is the same in both populations. The gener¬ 
alized distance of Mahalanobis is defined by 

i^ ir’^ Si Sj , where (o-*^ = (o’q)“'. 

In order to estimate this quantity from the sample, we first calculate the mean 
within-sample covariance s,^, where 

Si; = g g (x,„„ _ - Xi,,.)/2{N - 1), 



DISCEIMINANT KTNCTIONS 


163 


The estimated distance is then taken as 

(12) pZ)® = i s'^ d,d,. 

>,1=1 

Apart from a factor A/(i\r — 1), this is the maximum likelihood estimate. 

In the discrimmant function used by Fisher (1), the object is to find a linear 
function Iv>a = , where the ilf, are chosen to maximize the ratio of the 

sum of squares between samples to that wiLhin samples in the analysis of variance 
of 1. This is equivalent to maximizing the ratio of the difference between the 
two sample means of I to the estimated standard error of this difference. As 
Fisher showed (2), the Mi (apart from an arbitrary multiplier) are given by 

M, = i s'^d, 

i=x 

Consequently, the difference between the two sample means of I, the discriminant 
function, is 

.=^1 ../-i 

This is exactly the same as in equation (12). Thus the discriminant func¬ 
tion leads to the estimated distance, and viee versa. 


11. Extension to the present problem. In our case there are (p -b k) variates 
(p discriminators, k covariance variates) from which to estimate the distance. 
All variates, Xiwa and , are assumed to follow a multivariate normal distribu¬ 
tion. The covariance matrix, assumed the same in both populations, now has 
(p -|- k) rows and columns, and may be denoted by 


(13) 




For each of the covariance variates, it is Icnown that the population means 
are equal, so that the difference Sf is zero. This is the fact that dis¬ 
tinguishes the problem from ordinary discriminant function analysis. 

Hence, the generalized distance, as defined from all (p -f k) variates contains 
no contribution from terms in and is given by 


(W) 


(p + k)A‘ 




1 . 1-1 




The matrbe o-Jp+i,) is that formed by the first p rows and columns of the inverse 
of A. Note that in general this will not be the same as the matrix , which is 
the inverse of v,,. 

In the next section we consider the estimation of this quantity from the sample 
data. By analogy with the previous section, it might be guessed that the 
estimate would be of the form Ssjp+i) d,d ,. The maximum likelihood estimate 



164 


W. G COCHHA.N AND C. I. BLISS 


turns out to be of this form, except that instead of d, we have d[, tlie difference 
between the two sample means of the deviations of Xi from its ‘within-sample' 
linear regression on the . 


12 . Estimation of the distance. It is known that the generalized distance 
is invariant under non-singular linear transformations of the variates. For 
convenience, we replace the by variates x\va , where 

k 

'^iwa ” ^twa " 

{-I 

Thus x[„a is the deviation of Xi^a from its population linear regres.sion on the 
sjmo . The population mean of rlua is clearly pm,, and the difference between 
the two population means is therefore 5,. 

The covariance matrix of the xia,a , X(u,a may be written 


(15) 




where v,; t denotes the covariance matrix of the deviations of the aji^a from their 
regressions on the X(vtt ■ It follows that in terms of the transformed variates 
the generabzed distance is given by 


(16) (p+ k)A^ = t 

«,j-i 

where ^ is the inverse of the p X p matrix tnj.f. 

The joint distribution of the 2N observations on each of the Xtua iind ®{„a 
is as follows: 


(2,r)- 


-jv(j+o I »3 n +w 1 _{i, I +tf 


exp < - 


1 


(T I ndr c/.r^i 

2 _ If p 


V (:rfuja Pm)(c:,H;a P/uj) “b 

anL 


^ N k -l\ 

2 S 2 O' (a'fu.tt — M(w)(Xpwa — Pijui) ( , 
UIbI G(al J 

where v*’' is the inverse of the k X k matrix o-f,, 

We now proceed to estimate A** in equation (10) by maximum likelihood. 
For this, we obviously need the sample estimates of the tr’-’'* and the Sx, and it 
will appear presently that the sample estimates of the /9<( are also required. 
However, it happens that the and the pjui arc not needed. Hence the rele¬ 
vant part of the lilcelihood function is 


(17) L = W log I 


1 ^ ^ ^ 

2 w?i S i^l - P|J 



DISCRIMINANT FUNCTIONS 


165 


where 

k 

{-1 

Differentiating first with respect to , we obtain 

(18) E E cr'^ - U = 0. 

a — 1 Janl 

Except in the case (with probability zero) where our estimate of tr” ^ turns out 
to be singular, these equations have no solution except 

(19) E - Hi.) = 0 

a=l 


for every j, w. Consequently 


so that 


M;u) 




~ Mji ~ ” dj (E d^, 

£-1 

•This shows that the must also be estimated. Now 

dL _ dL dx'„a _ V V »?•£/' ' \ 

ao' 2-J a„/ ao 2^ 2-i V "" l^(w){^jwa “ M/w)' 

CrPi^ lijaal oEBsl W“»l a«»l 


Once again, unless the estimate of is singular, the only solutions of the equa¬ 
tions formed by equating this quantity to zero are 

2 AT 

(20) (^£iua /££ic) (aijuja gjiu) “ 0 

Kiel ami 


for every ^,j. 

Since , the term in ^{w vanishes. Substituting for x' in terms of x 

from (17), we obtain 


2 N 

E E 

Utisl CKal 


_* 

► 

1]bs1 




where 6 „ stands for the maximum likelihood estimate of /Ij,. These equations 
may be written 

k 

(21) ■ E = ddlji 

IJ-wl 

where E denotes a sum of squares or products of deviations from the sample 
means, containing 2(iV — 1) degrees of freedom The equations are therefore 



166 


W. G. COCHRAJJ AKD C. I. BUSS 


the ordinary normal equations for the ‘within-sample’ multiple regression of 
Xjuja on the • 

Finally, differentiation of L with respect to the ^ leads to 

4 AT 

(22) 2A^O'i, J = (^iua ®lw)(2')‘u,a ~ 

itrnl aBoi 

This is just the ‘within-samples’ sum of squares or products of the variates x'. 
On substituting for the x' in terms of the x and using equations (21), we obtain 

k 

2N^ij,( = E,j — ^ = £';/.{ (say). 

To summarize, the estimated distance is given by moans of the equation 

ip + k)D^ = t, = 2N i FJ‘-^ d'i d\, 

i,j-i I.J..1 

where E'^'^ is the inverse of and 

d[ = di-Y. hii d(. 

{"1 

This estimate was obtained by assuming all variates jointly normally dis¬ 
tributed. From the form of the likelihood function (17) it can be seen that the 
M.L. estimate of the distance remains the same under the less restrictive assump¬ 
tions that the Xfua are fixed, while the deviations of the Xi,,„ from their regressions 
on the Xju,a are jointly normal, 

13. Computational procedure. An orderly procedure for calculating the 
generalized distance will now be given. From this, the method for computing 
the corresponding discriminant function Avill be shown. The computations also 
lead to the generalization of Hotelling’s T*. The steps are as follows. 

(i) . First form the within-sample’ sums of squares and products of all variates, 

with 2(A' — 1) degrees of freedom. These are the quantities denoted by 
E„ , . 

(ii) . Invert the matrix , givmg E^. 

(iii) , The regression coefficients b,E, estimates of the j3,t, are now obtainable by 

means of the relations 

i'u = i E^,E^\ 

as is clear from the usual matrix solution of equations (21), 

(iv) . The sums of squares and products of the deviations of the Xf'from their 

within-sample’ regressions on the X£ are now computed from equations (22) 

2mij.( = E,i.( = E,,. _ Y hi Eli 
e-1 

t 



DISCRIMINANT FUNCTIONS 


167 


(v). The final step is to invert the matrix En j, giving E'^ ^ and to form the 
product 

{p + /c)Z)* = 2N Yj E” ^ d'id'], where di = d, — 2 &•{ . 


c-i 


When there were no covariance variates, the discriminant function I had the 
property that the difference between the two sample means of I was equal to the 
estimated distance (Section 10). This relationship can be preserved when co- 
variance variates are present by defining I so that 

“ V ^ Hd, i ITtwa |j 

1=1 \ 1-1 / 

and calculating the weights M, from the equations, 

= d'i. 

For in that case, 


)=i 


i E'^'^ d'i. 


Consequently the difference between the two sample means of I is 

iM,d'. = i E'^W.d',, 

1=1 i.)=i 

which (apart from the constant 2N) is equal to {p k)D’‘. 


14. Distribution of the estimated distance. In the ordinary case, with no 
covariance variates, the frequency distribution of the estimated distance has 
been given by several authors, e.g. Hsu [G]. It will be found that in our prob¬ 
lem the distribution is essentially the same, except that the quantity D must be 
multiplied by a new factor and that one set of degrees of freedom entering into 
the result must be changed from (n — p + 1) to (?i — p — /c + 1). 

Thus far we have assumed that all variates jointly follow a multivariate nor¬ 
mal distribution. It is convenient at this stage to regard the covariance variates 
x^ioa as fixed from sample to sample, and to use the conditional distribution of 
the Xi,i,a , subject to this restriction. It is well known (e.g. Cramdr (7, section 
24.6]) that this conditional distribution is the multivariate normal 


(23) 


• exp 


( 271 )-^'’ I n dx„ 

r 


—i 


2 N p 

^ ^ j X j (T * (^luta ^ “ Tiuia) 


Ula&l i,ynl 



-where 

If 

yiwa “ 

{-1 



168 


Vf. a. COCHRAN AND C. I. BLISS 


Since the estimated distance is a function of the quantities Ei, j, d'l, we now 
find the joint distribution of these variates. The joint di.stribution of the 
sums of squares and products E^, t is obtained by quoting a slight extension of a 
result due to Bartlett [8], which may be stated as follows. 

Let the variates Xma folbw the dislribuHon (23) and let 

i If 

(l) Eij “ ^ 'j ^ i (lUitiia aijyj )(iTjifla "" 

u_l ao.1 

he a typical ‘within-samples’ sum of squares or products, 

Hi) 

be ihe ^within-samples^ partial regression coefficient of Xx on j and 

k 

(W) = E,i ~ Y,b^iE,i 

{ml 

be the sum of squares or products of deviations from these regressions. Then 
{a) the quantities Et,.^ follow the Wishart distribution 

c I E„., exp I t t) n dE,,, 

with (n - k) d.f., where n ~ 2{N - 1), 

(b) this distribution is independent of that of the bt^, and 

(c) both distributions are independent of that of the means xiv, and consequmitly 
of that of the difference dt = {xa — Xa)- 

The result was proved by Bartlett for a sample from a single population. The 
extension to the case of two populations is straightforward and will not be given 
in detail. 

From (b) and (c) it follows that the distribution of the E(j ( is independent 
of that of the quantities 

d'. = d,-ib,(d(. 

(-1 

Further, with the X( variates fixed, the d', are hnear functions of the with con¬ 
stant coefficients and hence follow a multivariate normal di.stribution, Wilks 
[9]. We now find the means and the covariance matrix of this joint distribution. 
From the joint distribution (23) of the Xiwa , it is easily scon that 

E{d,) = «, -f £ d(, 

Also, since by standard regression theory the are unbiased estimates of the 

Pa, 

E b.{ dfj = 2 i9i£ d(. 



Hence, by subtraction, 

(25) 

Now 


DISCRIMINANT FUNCTIONS 


169 


E(d'0 = S.-. 


Cov {did',) = Gov {d, — ^ d^){dj - 2 5/, d^). 

,=1 

By (c) the distributions of the di , b„ are independent, so that there will be no 
contribution from products of the form dib,„. Hence 

(26) Cov (d( d',) = Cov (d. d,-) + 2 df d, Cov (5,tb„). 

{.ij-i 

Since d, is the difference between the means of two samples of size N, Cov 
(d,d,) is 2 ffi, i/N. The covariance of bit a^nd b„ is more troublesome. Writing 
the expressions for these regression coefficients in terms of the original data, we 
have 

Gov (b.tb„) = 2 Cov {EM = 

I..!'tel 

K 2 N 

^ Uj (^Xwot ) Cov 

Since successive observations are assumed independent, the covariance term van¬ 
ishes unless w = z and a = f, in which case it equals <r„ f. Thus 

Cov (b.fb„) = <ri,.t E = ,ri,(E^\ 


Finally, from (26) 

(27) Cov (d'i dO = t (I + di d,) = rv., j (say). 

Having obtained the distributions of the H„.t, d(, we may apply Hsu’s result 
[G] for the general distribution of Hotelling’s T^ In our notation, this may be 
stated as follows. 

If Old variates d[/-\/v follow the multivariate normal distribution with means 
Sif-s/v and covariance matrix and if the variates Eij.i follow the Wishart 
distribution with {n — k) d.f. and covariance matrix o-„.£, the two distributions 
being independent, then 

y=i d: d',/v, 
id~l 


follows the distribution 


§ h\Blhp + h,^{n-k-p + l)} 





170 


W. G. COCHBA.N AND C. 1. BLISS 


wheve 


r = J ^ 

l.J-l 

^ = %+ H n^2{H ~ 1). 

ly t.n-l 

This distribution is, of course, the distribution of the ratio of two independent 
values of x^ with p and (n - fc - p + 1) d.f. respectively, in the case wliere the 
numerator is non-central. 


16. Tests of significance. This result leads to the extension of Hotelling’s 
T'^ test. For if 5. = 0, (i = 1,2, • • • p), then r is zero and 


£ d[ d'i 


i.i-i 


is distributed as vpF/(n — ft — p + 1), with p and (n ~ k — p + 1) d.f. The 
distribution (28) above gives the power function of this test. 

We may also wish to apply a test of this type to a subgroup «,* of the dis¬ 
criminators (f = 1,2, • • • q < p). Speaking popularly, this is a test of the null 
hypothesis that the above variates Xi contribute nothing to the discrimination 
between the two populations, given that the remaining discriminators and the 
covariance variates have already been included.* To see what is meant more 
precisely, consider the following transformation: 

ail = x, — 2 Pnxi — 2 t =a 2, - < • g; 

X\ = Xi — 'S j; b: gr X, . . . P; 

= f = l,2, .-.ft, 

where the fi’s are population regression coefficients. Then it is not difficult to 
see that the distance is now given by 


where tr*’'** 


(p + ft)A® = 2: 


is the inverse of the covariance matrix of the deviations of the xt 
trom their regressions on the Xi plus the Xf, and 

— 2 /3((5(. 

wouW bTif^Jh ^ " 3) distance is exactly the same as it 

rftp t i T I 'is i^hsreforo a test 
ot the null hypothesis that = 0, {i = i, 2, • •. g). 

remaining discriminators xi and the covariance variates Xt are re- 
garded as fixed, the method of proof in the previous section provides mTZ 


^ Tho test is illustrated in section 7. 



DISCRIMINANT TtINCTIONB 


171 


for this hypothesis also. It is found that the sums of squares or products 
follow a Wishart distribution with {n ~ h — p q) d.f., while the quantities 

d[ = d^— X) Kz di — di 

Ua+i {_! 

are normally distributed, with zero means when the null hypothesis is true. This 
leads to the result that 

E d[ d] 

t,j=i 

is distributed as v'qF/{n — k — p 1), with q and {n ~ k — p 1) d.f., and 

v' 

the sum extending over both the covariance variates and the discrimmators that 
are not being tested. 

16. Discussion of the gain due to covariance. In this section we attempt to 
construct a measure of the amount that has been gained by the use of the co- 
variance variates. Only a preliminary discussion will be given: a complete dis¬ 
cussion would be lather lengthy, owing to the many different uses to which the 
discriminant function can be put. Perhaps the problem can most easily be seen 
by considering the effect on I-Iotelling’s generalized test of significance. 

The power function of this test, as obtained from equation (28) section 14, de¬ 
pends on four factors; the level of significance that is chosen, the degrees of free¬ 
dom 111 and ?i 2 in the numerator and denominator of F, and the parameter t. 
If the covariance variates were ignored, the usual test could be applied to the 
discriminators alone. In this case we would have 

ni = p, n'z = n ~ p + 1, t' — J2cr’’'5,5,/y', where v' = 2/N. 

With the covariance variates, we have 

ni = p, ^2 ~ n — p — /c -t- 1, T = 


where 


M = | + d,. 

fi’he first point to noie is that 

2 > 2 

This is an instance of the general result that the addition of new variates cannot 
decrease the value of pA*, To see this, replace the covariance variates by their 



172 


W. G. COCHRAN AND C. I. BRISB 


deviations from their regressions on the discriminators. This transformation 
gives 


(29) 


. j _ 1 .* _i e «_..1 


where 


si — d( ~ ^ S,. 

tail 

Since the term on the right of equation (29) is a positive definite quadratic form, 
the result follows. 

Consequently, the first effect of the covariance variates is to make the numer¬ 
ator of T greater than that of t'. As a partial compensation, the denominator 
V is also greater than v', but it may be shoivn that the difference in the denomi¬ 
nators will usually be trivial if k is small relative to n. We therefore expect t 
to be greater than t'. Now for fixed rij, n 2 and sig:nifieance level, it i.s well 
Icnown that the power function (28) is monotone increasing with r. Hence, 
other things bemg equal, the increase in r due to the covariance variates leads to 
a more powerful test. 

The two power functions, however, differ in another respect, in that with co- 
variance the value of m is reduced from (w ~ p + 1) to (n — p — /c ■+• 1). This 
decrease in the number of degrees of freedom in the denominator of F will to 
some extent offset the gain from an increased t. Examination of Tiuig’s tables 
[10] indicates, however, that if the degrees of freedom are substantial, tliis effect 
will not be important. Moreover, in most practical applications, k is likely to 
be only 1 or 2. Hence, as a first approximation the effect will bo ignored, though 
to do so tends to overestimate the advantage of covariance. 

Suppose now that r = rr', where r > 1. Since t' is proportional to W, the 
size of sample taken from each population, we could make r' ~ t by increasing 
the size of sample (when covariance is not used) from W to rAT, This suggests 
that the ratio rjr' can be used, as a first approximation, to measure the relative 
accuracy obtained with and without the use of covariance. This measure carries 
approximately the usual interpretation that the inferior method would become 
as good as the superior method if the sample size for the inferior method were 
increased by the factor r. A further refinement could be made to take account 
of the difference in the m values. By trial and error applied to Tang’s tables, 
one could determine r' so that the two power functions would be as nearly coin¬ 
cident as possible. 

In practice, the ratio t/ t' must be estimated from the data. From the power 
function in equation (28) it is found by integration that the mean value of y is 

(2t -f- p)/(nj - 2), 

so that an unbiased estimate of t is 




DISCKIMINANT' FUNCTIONS 


173 


This suggests that the quantity 

, j ui - 2) p ^ 

ni 

should be calculated both with and without covariance. The ratio of the twb 
values will probably not be an unbiased estimate of t/t\ but may be used pend¬ 
ing further information about its samplmg distribution, This type of calcula¬ 
tion is made for the numerical example in section 8. 

17. The case of a row by column classification. Thus far the discussion has 
been confined to the case where there are only two populations. The technique 
may also be used when there are more than two populations. The difference 
Si between the two population means is replaced by some linear function of the 
population means. As an illustration we consider a row by column classification, 
the case that arises in the numerical example. No detailed proofs will be given, 
though it is hoped that the theory can be fairly easily developed from the mathe¬ 
matical model. , ' 

A typical variate is , where f = 1, 2, ■ ■ p denotes the variate, w = 1, 
2, • • • r denotes the row and z ~ 1, 2, • • • c denotes the column, there being 
one observation in each cell 'The variates follow a multivariate normal dis¬ 
tribution, with covariance matrix o-,/ ( and means 

li 

EiXuiii) = fli Ptw + Ti3 + 2 " ^i-)^ 

i-1 

where p,u, denotes the effect of the row and yi* that of the column. Without loss 
of generality we may assume that 

^ J PlUJ ^ ’Yu fi. 

W I 

In addition, there exists a known set of variates t, such that 

7i3 = SiU, = 0 

That IS, the column constants have a linear regression on a set of known numbers. 
The following are the maximum likelihood estimates of the relevant constants. 

q-il 



where 



174 


W. G. COCHRAN AND C. I, DLI8B 


In the notation used for numerical calculation, 

(d. - Z &.{ df) _ _ di_ di = ^t. Xt; , 


S, = 


i2' 


rY^t] rYtl 

the quantity X{, being the column total Finally 


The distributional properties are similar to those in the two-population case. 
The quantities follow a Wishart distribution ith (re — r — 1 •— &) d.f. 
and covariance matrix o-.j j. The variates dj follow a multivariate normal dis¬ 
tribution with means rSiHt] and covariance 

ffif ((r7jt\ -h HE^’' d(d,) — (say). 

Consequently, 

y = ZE'^-Uid'i 

is distributed as vpFf(rc — r — p ~ k) with p and (rc — t — p — k) d.f. and 
parameter 

T = h(r:stlW^\s,/v. 

Thus in the numerical example, with r = 12, c = 4, p = 2, /c « 1, this procedure 
would have given an F test of the null hypothesis r = 0, where F has 2 and 33 
d.f. However, the contribution from 2 degrees of freedom was deliberately 
omitted from the quantities F.-,-, so that F actually had 2 and 31 d.f. 


PART III 

18. Justification of the ‘dummy variate’ approach. It remains to show that 
the method of calculation used in the example (sections 6 and C) is equivalent to 
that derived from theory. There are two chief points to prove. First, that the 
M’s found from the equations 

(30) Z = d[ 

are proportional to the corresponding L’s found from the equations 

(31) Z Si, Li = d, 

a 

where the sufEx o denotes summation over both Xi and X( variates. 

Now, since S., = Ea -[- di d,/240, equations (31) are the same as 

(^^^ Z Fyiy = d,(l — 2] Ly dy/240). 

® tt 

Hence the L’s in ( 31 ) are proportional to the values found from the equations 

SSiA'-d,. 



DISCRIMINANT JUNCTIONS 


175 


But it is well known that if the L[ are eliminated one by one from equations (33), 
we obtain 

i 


which is the same as (30). This proves the first point. 

The second point to establish is that the F test in the example is the same as 
that obtained from theory. In section 15, it was shown that 

(34) 'ZE'’U'idi/v 


is distributed as pFl(n — p — /c + 1). In the analysis of variance of Table 6, 
section 6, the quantity folio 'ing the same distribution was 


(35) 

whore 


{Sa - S() 
(240 - Sa) ’ 


/S. = Z S'^ did,, /S{ = Z d£ d, 

a £.1 

Since equations (31) and (32) have the same solution, wo must have 


S’' = F'''(l - Z L, d,/240) = - S„/240). 

a 


Multiplying both sides by di d, and suinming over all i, j, we obtain 
Sa « EaO- - Sa/MQ) = Eail + F„/240), 
where Ea is defined analogously to Sa ■ Similarly 

Si = Ei/(1 + Ei/2i0). 

Hence 

/op\ Sa ~ S( _ Ea — _ Ea — Ei 

^ 240 - Sa 240 + E]~ V ' 

■■ 

Transform the variates a;,, xc into variates x'i , , where xi = X{ — Sb^Xi. 

It is easy to see that this transforms 

Z E'^ di dj into Z E^” dt d, + Z d'i d ',. 

a ti; 

That is, 

Ea = Ft + Z F’' ^ d'i d'i , 


since the quantity on the left is invariant under non-singular linear transforma¬ 
tions. Hence from (30), 


{Sa - Sj) 
(240 - Sa) 


Z F'^'^ d', d'jv. 



176 


W. G. C0CHR4.N AND C. I. BLISS 


From (34) and (35), tins establishes the equivalence of the F tests. While the 
proof has been given only for the type of data encountered in the example, the 
same method will apply to other types of data. 

In conclusion, we wish to thank the referees for many helpful suggestions in 
connection with the presentation of this paper. 

IIEPEIIENCES 

11] R, A. Eisman, “The use of multiple measurements in taxonomic problomB," Arinak 
of Eugenics, Vol. 7 (1936), pp. 179-188 

[2] R A. Fisher, “The statiationl utilization of multiple measurements," AnnaU of 

Eugenics, Vol8 (1938), pp, 376-380. 

[3] H Hotelling, “The generalization of Student’s ratio,” Annals of Math. Slal., Vol. 

2(1931), pp. 360-378. 

[4] P. C. Mahalanobie, “On the generalized distance in statistics,” Proc. Nal. ImL 

Sci Inch, Vol, 12 (1936), pp 49-66 

[6] C I Bliss AND H. P. Marks, "The biological assay of insulin,” Quarf. Jour. Pharm, 
and Phwmacol , Vol 12 (1939), pp. 82-110; 182-205. 

[6] P. L, IIsu, "Notes on Hotelling’s generalized 3',” Aiinals of Math. Elat., Vol. 9 (1938) 

pp 231-243 

[7] J|H, Cnmii'R, Mathemaiical methods of slatistica, Princeton Univeisity Press, 1910. 

[8] M S Bartlett, "On the theory of statistical regression,” liuy. Sue. /’roc. Edm Vol 

63 (1933), pp. 271-277 ' 

[9] S S.’SffiL'S.a, Mathematical Elalislics, Prinoeton University Press, 1943, p. 70. 

[10] P. C, Tang, “The power function of the analysis of variance tests,” Slat. lies, Memoirs 
Vol. 2 (1938), pp 126-167. ’ 



ON THE KOLMOGOROV-SMIRNOV LIMIT THEOREMS FOR 
EMPIRICAL DISTRIBUTIONS 

By W. Fellee 
/ Cornell University* 

Summary. Unified and simplified derivations are given for the limiting forms 
of the difference (1) between the empirical distribution of a large sample and the 
corresponding theoretical distribution and (2) between the distributions of two 
large samples. 

1. Introduction. Let Zi, ■ ■ • , Xw be mutually independent random vari¬ 
ables with the common cumulative distribution function F(a;). Let X*, , 

Xu be the same set of variables rearranged m increasing order of magnitude. 
The empirical distribution (or sum-polygon) of the sample Xi, ■ • ,Xn is the step 
function Sn{x) defined by 

0 for a; < X* 

(1.1) Sy{x) = ~ for X* < a- < X'+i 

1 for a; > Xt- 

\ 

In other words, N • Sn{x) equals the number of variables X, which do not exceed 
X. We expect intuitively that Sjtix) F{x) as X oo. In fact, if this were 
not so the notions of distribution and sample would be meaningless. The so- 
called w‘-criterion of von Mises [4] provides rough estimates for the probable 
deviations of iSjv(x) from F{x) for certain forms of F{x) (cf. von Mises [4]). A 
much stronger result is due to A. Kolmogorov and is of great interest in the 
theory of non-parametric estimation (Kolmogorov [3]). The maximum of the 
deviation | )Sn(x) — F(x) | is a random variable Dk whose distribution is easily 
seen to be independent of the special form of F{x) provided only that F{x) is 
continuous.^ The exact distribution of Dn is not known, but Kolmogorov found 
that N^Dn has a limiting distribution. More precisely we have 

Theorem 1 (Kolmogorov [1]). Suppose that F{x) is continuous and define 
the random variable Dk by 

(1.2) c i.u.b.l ^!j,(x) - F(x) 1. 

*Eos6aroh under an ONR contract. ' 

^ This fact will not be used explicitly iji the sequel but follows as a byproduct from our 
proofs. A simple direct proof consists in considering tho random variables S* = F(X*) 
which are uniformly distributed, the maximum deviation Dn of the empirioal distribution 
of the new sample (St) from the uniform distribution has the same distribution as 
of. Kolmogorov [1]. 


177 



178 


W. B'ELLKK 


Then for every fixed z > 0 as N —>■ m 
(1-3) Tr < 2N-^\ 


^L(z) 


where L (z) is ihe cmmlaiive disirihuhon function which for z > Qis given hy either 
of the foUomng equivalent relaiion^ 

W up 

(1.4) L{z) = 1-22 (-1)’"' c-'’** = (2 t)'z ■' 2 






For z < Owe have, of course, Liz) = U. 

Equally interesting is Smirnov’s result coneerniug the maximum difference 
between the empirical distributions of two samples with the same cumulative 
distribution. 

Theohem 2 (Smirnov [5]). Let (Xi, . X,„) and (Fi, ■ ■ • , F„) be two sam^ 

pks of mutually independent random variables having a common continuous dis~ 
tnbu ion F{x) Let S„.(.t) and T„ix) be ihe corresponding empirical distribution 
jimchonB and dejme a new random variable by 

= l.u.b.| - Tdx) |. 

Put 


( 1 . 6 ) 


IV = ^7 

III + n 


and suppose ihal wt —> co ^ n ca (hat 
(1.7) 


m 


a* 


where a is a constant. Then for every fixed z > 0 

Pi' < zir»l - Liz), 
where Liz) is the same as in (1.4), 

m bLrilrtmptetelfdifcS methoS^ *k'1'“°'' ““ 

„( depth proved iir . (Kota™v (2^ 

M .Iteroative proof of liotaogorov’o fcorom i. duo to Smtaov ' 

Burprismg that Smirnov’s pinofsiTOuire a ^ J 

considerations. It is the purDosn of tlm technique and many auxiliary 

of the two theorems which are baaed nn unified proofs 

-_^s union are based on methods of great genorallty.® The now 

* The equivalence of the two formulnn in n 111 

formation formula for theta-fmictions Wo 'tlnil 7 ‘’•'‘'en trana- 

(1.4). The second is more useCul for small z A 7ble77V-!‘' ’‘oprosentation in 

It IS mpnntcd 111 ihc niesent issue of thn /(«« ^ Smirnov (0). 

“ Amon. oth, > results which can be proved by f ^PP’ 27!)~281). 

^e.-orru.nandhrst,assa.timeproa^^^^^ 



ON THE KOLMOQOROV-SMIIINOV THEOREMS 


179 


proof is not simple but simpler than the original ones. At any rate, it requires 
essentially only routine manipulations with generating functions and their 
limiting form, the Laplace transforms. However, the paper aims mostly at a 
unification of methods. 

As a byproduct of the proof we obtain 

Theorem 3. Let An be the number of points a: where the step-polygon Sn{^) 
of Theorem 1 leaves the strip F{x) ± zN~^. The expected value of the random vari¬ 
able An satisfies the asymptotic relation 

(1.9) E{An) ~ 2(27rA)*(l - «>(22)}, 

where $( 2 ) is the normalized Gaussian distribution. 

An analogous corollary to Theorem 2 was given by Smirnov [8]; formula (1.9) 
holds also for the number of intersections of the graph of Sm{p) with the step- 
polygons TA^) ± These results should come as a surprise to most statis¬ 

ticians. According to Theorem 1 there is a positive probability that Ajv = 0 
and nevertheless E{An) is of the order of magnitude iVV The explanation 
lies in the fact that if Sn{x) crosses the curve F{x) -f- zN~^ at some point then it 
is extremely likely that S{x) will in some neighborhood continue to fluctuate 
around values F(xf) -f- zN~^t crossing that curve a great many times. The differ¬ 
ence »SAr(a;) — F{x) exhibits, in the limit iV many small oscillations. This 
phenomenon is related to the well-known fact that the path of a particle subject 
to the Einstein-Wiener diffusion process has no derivatives. 

Instead of the absolute values of the differences we may consider the differ¬ 
ences themselves and derive two parallel theorems for the maximum and the 
minimum. As an example we shall prove 

Theorem 4. With the notations and assumptions of'Theorem 1 let 


(1-10) 

DU = l.u.b.(<Sjv(a:) - Fix)]. 

Then 


(1.11) 

Fi{Dt < 2A“*) 1 - 6"““’ 


The proof is simpler than that of Theorem 1 but uses the same method. 

2, Notations and preliminary remarks. For printing convenience it is desir¬ 
able to avoid complicated subscripts and we shall therefore use the following 
notation for binomial coefficients 

(2.1) C(n,fc)=(”). 

Similarly, for the general term of the binomial distribution we shall write 

(2.2) B{n, k',p) = C{n, k)p\l - p)"~^ 



180 


W. FEIjIiER 


If A is an event, A will denote its negation (complementary event). Finally 

(2.3) PrmB) 

denotes the conditional probability of A for given B, 

Our proofs depend on a special case of the continuity theorem for charac¬ 
teristic functions. Since we shall deal only with probability density functions 
j{l) which vanish for < < 0 it is preferable to use, instead of the characteristic 
function, the Laplace transform 

(2.4) 4>(s) =: [ o~'‘f(t) dl. 

Jo 

(This amounts to using the variable — s instead of the usual is and therefore 
</>(s) obeys the formal rules for characteristic functions.) 

For any sequence {ui.}{k = 1, 2, • • •) of non-negative numbers we define the 
generating function u(\) by 

ao 

(2.5) u(X) = 2 Mfc X*. 

t-i 

Now let S > 0 be fixed and consider the step-function fi{t) defined by 

(2.6) fs{t) = Uk for (k — i)S < t < IcS 

(fc = 1, 2, • • • ; fs{i) = 0 for 4 < 0). Its Laplace transform is 

(2.7) <l>,{s) = M(e-*’). 

8 

We have, therefore, the continuity-theorem: If, as 5 —> 0, 

(2.8) in(e-**)-^^(s), 
then for every fixed t > 0 

(2-9) Uk —>/(0 when lo8 (; 

conversely, if (2.9) holds then (2.8) is true, 

3. Proof of Theorem 1. Since F{x) is continuous it is possible to define num¬ 
bers Xk such that 

F(.Xk)=~, (/c*- 1,2,,N- 1). 

This definition is unique except when F(x) = k/N within an entire interval, in 
which case we define Xk as the left endpoint of that interval. 

Let c > 0 be an- integer. We shall evaluate the probability of the event 
Dn > c/N and we shall later put 

(3-2) c = zN\ N 00 , 



ON THE KOLMOGOEOV-SMIENOV THEOEEMS 


181 


Suppose first that for some particular x 

(3.3) S„{x) - Fix) > 1 . 

This point x is contained in a maximal interval in which (3.3) holds and at the 
right endpoint f of this interval we shall have 

(3.4) - Fi^) = 1. 

Now Sa-(S) is necessarily a number of the form r/N with an integer r. Since c 
is an integer also F(£) = k/N and hence $ = xj, for some k. From (3.4) we 
conclude that 

(3.5) < Xk, Xjt+c+i > Xk 

or in other words: exactly k + c among the N variables Xy are smaller than Xk. 
Denote this event by Ai(c). The inequality (3.3) takes place for some x if, and 
only if, at least one among the events j4i(c), • ■ ■ , ^^^(c) occurs. The argument 
applies equally to c < 0 and shows that ike event Du > c/N occurs if, and only if, 
at least one among the events 

(3.6) ^i(c), Ai(-c), As(c), A 2 (-c), ••• , AAr(c), AAr(-c) 
occurs. 

Let Ur and F, be the events that in the sequence (3.6) the first event to occur 
are A/c) or A,(-c), respectively. More formally, the events Ur and V, are 
defined by 

Ur = Ai(c)Ai(-c) • • • Ar-i{c)Ar-li~c)Aric) 

(3.7) _ _ _ _ 

Y, = At(c)Ai(-c) • • • Ar-iic)Ar-ii-c)Aric)Ari~c). 

These events are mutually exclusive and therefore 

(3.8) Pr I = g Pr {CT.i + I: Pr {7,). 

From the very definitions we have the followmg two fundamental relations 
Pr {AA(c)i - E Pr {Ur\ Pr {Aa(c) | A,(c)l 

rwl 

+ E Pr [Vr] Pr {Afc(c) j A.(-c)l 

r-l 

Pr {Aa(-c)) = E Pr {Ur] Pr {A,(-c) | A,(c)) 

r»l 

+ EPr (7r}Pr iAA(-c) 1 A.(-c)), 

rsil 


(3.9) 



182 


W. FELLER 


This is a system of 2N linear equations for the 2N unknowns Pr ( Ur\ and 
Pr \Vr} and we proceed to solve it by the method of generating functions. 

By definition of aitwe have Pr [X, < re*) = k/N. The probability of the event 
Ak(c) (that the same inequality holds for exactly k + c different x’s) is therefore 
given by 

(3.10) Fv[Ak{c)} =^BiN,k+cik/N) 

(cf. (2.2)). Similarly, it is readily verified that for r < k 

(3.11) Pr l^(c) li,(c)} = B(N - r - c,k - r-,(k ~ r)/(N ~ r)). 
and 

(3.12) Pr M,(c) 1 Ar{- c)] = BiN - r + c,k - r + 2c- {k - r)/iN - r)). 

The last three equations hold also for c < 0. They can be wi itten in a more con* 
venient form in terms of the quantities 


(3.13) 

p.fc) = . 

In fact 


(3.14) 


(3.15) 

Pr {Ar(c) 1 Ar(c)] = 

Pv-r( C) 

(3.16) 

Pr {A*(c) 1 A,(-c)l = Pk:r(2c)pv-.r(-c) 

Vif-rW 


If these expressions are introduced into (3.9) the second factor in the numerator 
cancels. A further simplification is achieved on introducing now sots of un¬ 
knowns 


(3.17) u, = Pr {t/,} V, = Pr | 7 j Pf!® . 

Pv-r(-c) p;v_r(c) 

The fundamental equations (3.6) then reduce to 

Vk.{c) = Z Urfk-M + Z VrPh-^{2c) 

r-l r-1 

( 3 . 18 ) 

Pk{~c) = Z WrPi-r(“2c) -p Z *^rPs-r(0). 

ml 

This system is of the convolution type and can therefore be solved by means 
of generating functions. The essential point is that the p*(c) are defined for 

all k and that the system (3.18) therefore determines the unlcnowns u, and Vr 
for all r > 0. We put 

M(X) = Zwa* «(x) = Z«*x* 

J"-! 1-1 


(3.19) 



Off THE KOLMOGOROV-SMIRKTOV THEOREMS 


183 


and 

(3.20) ‘ p(A;c) 

A^l 

(The factor iV~* serves to simplify formulas.) Then obviously 
p(X; c) = m(X)p(X; 0) + «'(X)p(X; 2c); 

(3.21) 

p(X; ~c) = m(X)p(X; -2c) + y(X)p(X;0). 

From here we find w(X) and v(X). Equation (3.17) then determines Pr {i!7,| and 
Pr {Fr}. Actually we are interested only in the two sums occurring in (3.S). 
We put 

Again, the fjt and rik are defined for all k (also k > N). From (3.17) we have 
(3.23) r Pr {C/,} = Iv, E Pr {F,} = . 

r-l r^l 


and hence finally, by (3.8) 

(3.24) Pr {D^ > c/N) = + r,^. 


In (3.22) we find again simple convolutions leading to products of the corre¬ 
sponding generating functions. Thus 


(3.25) 


l(x) - t t.x‘= 

*1-1 PAr(O) 

i-1 Pjv(b) 


We now pass to a study of the limiting form of these generating functions 
as W —> » and c —> <» in accordance with (3.2). Consider a fixed t > 0 and sup¬ 
pose that 


(3.26) 



From well-known properties of the Poisson distribution it follows then that 
(3.27) WV*(c) —> (27rf)~*exp(— z^/2l). 

Accordingly, the continuity theorem of section 2 implies (as can be verified dir¬ 
ectly) that 

; zN^) —>■ (2t)~* f T* exp {—ts — i/2i) dt 
Jo 

= (2s)“®' exp(—(2 ss^)*). 


(3.28) 



184 


W. FEI.I.ER 


(the last integral is well known and can be evaluated by elementary methods; 
the square-root is always positive). We see in particular that the limiting form 
is the same for p(X; c) and p(\i - c). It follows therefore from (3.21) directly 
that 


(3,29) 


lim u(e '"') = lim v(e~'"^) 

N-ra W-*M 


cxp(-(2sr*)^) 


1 + exp (“(te’J*) ■ 


Using this and the fact that pj/(0) (2jrW)"* we conclude from (3.25) that 

hm iV-‘i(e-'") = lim 

(3.30) ^ _exp (- (teT*) . 

' \2s J 1 + exp ■(■- (sss^yn 


Expanding ■/>(s) into a geometric series we get 

(3.31) ^(s) = £ (-1)'“* exp (- (88a’a’)’'*). 

From the evaluation of the integral in (3.28) we conclude that ^(s) is the Laplace 
transform of 


(3.32) m = E (-!)'■* exp (-2^’sVO. 

The continuity theorem of section 2 in conjunction with (3.30) and (3.20) shows 
that 

(3.33) lim = limpv =/(l). 

Af-*oo 

In view of (3.24) this accomplishes the proof. 


4. Proof of Theorem 4. This proof is simpler than the preceding one inas¬ 
much as we are now interested only in the events Akic) for c > 0. This time we 
define Ur as the event that k is the smallest subscript for which Ak{c) occurs, that 
is, JJf = .4i(o)i2(c) • • • jlr-i(c)Ar(c); no analogue to the event V, will be used. 
With the same notations as before (3 9) is replaced by 

(4.1) Pr [Ak(.o )) = i: Pr (U,) Pr {A,(c) | /l,(c)}, 

• rixl 

and hence (3.21) by 

(4.2) P(^;c) = it(\)p(X;0). 

Here p(X; c) is the same as before, so that (cf. (3,29)) 

(4.3) lim u(e"*'") = exp (- ( 2 s 2 ’)‘'’). 

iV— *go 

Again, the first equation (3 25) holds without change and therefore we get in¬ 
stead of (3.30) 

(4 4) hm = (^Y exp (- (8«’)‘'’) 



ON THE KOLMOGOROV-SMIHNOV THEOREMS 


185 


From (3,28) this is the Laplace transform of 
(4.5) f(t) = exp (- 2^70. 

As before we conclude that Cjf —> f(l), which accomplishes the proof. 

6. Proof of Theorem 3. We have seen m section 3 that the mtervals in which 

(3.3) holds are in a one-to-one correspondence with the events Akic). lienee 

(5.1) E(A^) = S Pr {Afc(c)) -f S Pr (A,(-c)}, 

To evaluate the sums we use (3.10). If AT —>• » and again c = zN\ k/N i, 
then by the central limit theorem 

( 5 . 2 ) B^N,k + c: VN) - ■ 

It follows then from (3.10) that 

(5.3) Pr (A,(c)! ^ (2^)“*'^ {f(l - 01"’'“ exp (-sV2i(l - 0) dt. 

Call the right hand member R. After the substitution t ~ sm^ (<^>/2) we find 

~ = —8(2ir)“"7 f sin“® <l> exp (~2;Vsin* <f>) dij> 
aZ Jo 

-t/2 

(5.4) = 8(27r)“’'’’3 exp (—2 j 7 / exp (—2i'* cot* 0) d (cot 0) 

Jo 

= —2 exp (—2;*). 

Since » 0 as 2 —> oo we conclude that 

(5.5) R = 2 exp ( —2x*) dx = {1 — 4>(23)) (2ir)’^*. 

The same asymptotic estimate holds for the other sum in (5.1), and hence Theo¬ 
rem 3 is proved. 


6. Proof of Theorem 2. Reorder the two samples in ascending order 
of magnitude and denote the rearranged samples by (Z*, • ■ • , Xm) and 
(Fi , • • ■ , Vt)- When speaking of the graphs of the empirical distributions 
jSm(x) and T„(x) we shall suppose that they have been completed by adding 
vertical segments so that the graphs become step-polygons. We shall put 


Then, according to (1.6) and (1.7) 


( 6 , 2 ) 


— a, N = pn = qm. 



186 


vr. MSLiiEn 


Without loss of generality we shall suppose that 

(6.3) P ^ 5- 

In order to carry over the proof of Theorem 1 it is necessary to define the 
events Aj-(c) in a judicious manner. For every integer fc > 0 let vk bo the num¬ 
ber'of variables Z, which are smaller than 7*. In other words, v* is defined 
as the integer for which 

(G.4) x:, < Yt < XU. 

Finally put 

where, ns usual, [s] denotes the greatest integer contained in x. 

For 0 <h <n let Ai.{c) be the event that 

(6.6) fk = m+o • 

The possibility of applying the proof of section 1 depends on the following 
Lemma. Whenever H 

(6.7) > :^ > 0 

n 

then at least one among the events j1i(c), Ai(~c), ••• , d.„(c), An{—c) occurs. 
Conversely, if one of these events occurs then 

(6.8) D„,„ ^ “ l) 

Proof. If (6.7) holds then either for some xo 

' (6.9) S„{xo) - r„(s») > - 

or the reversed inequality holds with c replaced by — c. It suffices to consider 
the case (6.9). For sufficiently large x we have Sm{.x) = Tn{x) == 1. Hence the 
graphs of Sn{vf) and T^ix) + c/n must intersect at an abscissa f > lo ■ The 
point of intersection lies necessarily on a horizontal segment of the graph of 
»Sm(i) and a vertical segment of Tn(s) + o/n. Hence there exists a h such that 
f = Ft* and, moreover, 

(6.10) r„(« -) -t- ^ < s.({) £ r„(f +) + -. 

n n 

This amounts to saying that 

( 6 . 11 ) k - 1+c ^ k_+j 

n m n 

In view of (6.3) and (6.5) this relation implies (6.6). 



ON THE KOLMOGOBOV-6MIENOV THEOREMS 


187 


'Conversely, suppose that the event Ah{c) occurs and let c > 0. Put again 
f = y*. Then, by definition, 

( 6 . 12 ) TM =-. 

mm n 

It follows that 

(6.13) SJS) > 1 = 2’„(^) + - - - , 

n m n m 

which in turn implies (6.8). This proves the lemma. 

Theorem 2 is concerned with values of c such that cn~^ = zN~^-, in passing to 
the limit we must therefore put 

(6.14) c = z{n/p)K 

Accordmgly, the relations (6 7) and (6 8) are asymptotically equivalent and our 
lemma shows that, asymptotically, the probability of (6.7) is tho same as the 
probability that at least one among the events Ai(c), ■ ■ • , d.y(—c) occurs. 
To evaluate this probability we proceed exactly as in section 3. The events 
Ur and Vr defined by (3.7) and the fundamental relations (3.9) hold again. 
However (3.10) — (3.12) have to be replaced by new evaluations. 

It is easily seen that the probability that exactly r among the X, are smaller 
than Ti, is the same as the probability to extract exactly r white balls before the 
/c-th black ball from an urn containing m white and n black balls (assuming 
that all orders are equally likely and that balls are not replaced). In this way 
one finds 

(615) Pr jj4fc(c)l = \,h V)C{,m 4- n. flt+e — k,n — ft) 

’ Cim + n, n) 

Pr jA*(c) I Ar(c)} 

(6.16) ^ (7(oji+c — Or+c + /c — r — 1, fc — r — l)C(m + n — Ot+o — lc,n — k) 

G{m + n — ar+c — r,n — r) 

Vr{A,{c)\Ar(-c)} 

(6.17) _ C{ah+c — ar-c + k — r — l,k — r — l)C(m + n — Ot+e — k,n — k) 

C(m ;-f- n — Or-c — r,n — r) 

The second binomial coefficient in the numerator is common to the three ex¬ 
pressions and cancels when the expressions are introduced into (3.9). These 
fundamental relations assume a more natural form if the occurring binomial 
coefficients arc enlarged to terms of a binomial distribution. It is easily veri¬ 
fied that the first of the equations (3.9) reduces to 

£(at+c + /c - 1, fc - l;g) 

J5(to -fn, 7i;g) 

V Pr / [/ ) . B(oh^ - ar+. + fe - ?• - 1, A; - r - 1; g) 

B{m+ n — ttr+c - r,n — r]q) 

+ S Pr (7 } ~~ Q't-c + fe — r — 1, /c — r — l;g) 

^ B(m + n - Qr-o - r,n — r;q) 


( 6 . 18 ) 



188 


W. FELLER 


The second equation is obtained on replacing the combination ii: + c by 
7u “ c* 

Instead of (3.17) wc put 

- P irrx -B(m + n, n; g )_ 

u, - rr ity.f ^ ^ ^ _ r; 5) 

_p , ' Bim + n ,n]g) _ 

^ i ’■< + n — Or_o ~ r,n — r; q) 

Then (6 18) becomes 

B{ax+c + k - l,k - l]q) 


— ^ UrB(Oiit^o Rr+o “h k ' r 1) ^ r 1 s ff) 
r=l 

Jb 

4- Z) »r-B(oM« - + k - r - i,k — T - l)q). 


This corresponds to the first equation in (3.18), Unfortunately (0.20) is not 
of the pure convolution type since — 0 ^+^ and Ok+c — Rv-e are not functions 
of the two variables k - r and c. The trouble comes from the fact that a*, 
as defined by (6.5), is not a linear function of k. It is, hoivover, plausible that 
we shall commit only an asymptotically negligible error if we omit the brackets 
in (6 5), that is, if we replace oj, by pk/q. Purely formally (0.20) then reduces 
to the first equation in (3.18) with 


Pkic) = B - 1, k - 1; . 


(Here the first argument in the right hand member is no longer necessarily an 
integer, and the factorials in the definition (2.2) should be interpreted by means 
of the gamma function.) To the new system (3.18) the considerations of section 
3 apply almost word for word: the only difference lies in the new norming (6.14) 
(which replaces (3.2)) and that instead of (3.26) we shall naturally let k/n —* t. 
Thus the limitmg form of Theorem 1 applies to the new system (3.18) with p*,(c) 
defined by (6.2), 

It remains to prove that the formal replacement of (0.20) and the corre- 
spondmg equation for — c, by (3.18) was legitimate. Now all coefficients 
in (6.20) are of the form B{v, r-,q), and we have only oliangod the first argument, 
V, adding a variable quantity which in no case exceeds one unit. In passing to 
the limit we put k tn and c ~ zn^p~K It follows that wo actually use only 
coefficients B(v, r, q) where v —> 00 , r —> 00 and v/r —» q. Accordingly, for 1 1 ? 1 
< 1 we have B{v + r ,q) B{v,r',q), and it is rather obvious that our system 

(6.20) is asymptotically equivalent to (3.18). 



ON THE KOLMOGOROV-SMIHNOV THEOREMS 


189 


REFERENCES 

[1] A Kolmoqohofi', “Sulla detemmazione empirica di ima legge di distnbuzione,” 

Inst Ital Atiuan,Giorn.,Y6\. i 1-11. 

[2] A. Kolmogohov, “tJber die Grenzwortsatze der Wahisoheinlichkoitsreohnung,” Bul¬ 

letin [Izmstipa] Academie des Sciences URSS, (1933), pp 363-372 

[3] A. Kolmogobofp, “Confidence limits for an unknown distribution function,” Annals 

of Math. Stat , Vol 12 (1941), pp. 461-463. 

[4] B. VON Mises, WahrscheinlichkeitsTechmng, F Deuticke Leipzig und Wien, 1931, p 316 

seq 

[6] N Smibnov, “Ob uklonemjah empiriceskoi krivoi raspredelenija,” Recueil Mathe- 
matique {Matematiheshn Shornik), Fi S Vol 6(48) (1039), pp 3-26 
[61 N, Smibnov, “On the estimation of the discrepancy between empirical curves of dis¬ 
tribution for two independent samples,” Bulletin Mathtmatique de I’UniversiU 
de Moscou, Vol 2 (1939), fasc 2. 



APPLICATION OF RECmiRENT SERIES IN RENEWAL, THEORY 
By Alfeed J. Lotka 

. > 

MelropoUlan Life Insurance Company 

Summary. The application of integral equations to renewal theory in popu¬ 
lation analysis and problems of industrial replacement is beset with certain diffi¬ 
culties which have been particularly discussed by W. Feller (tliose Annate 1941| 
vol 12 pp. 243-267). Some of these difficulties are avoided if the data of the 
problem are introduced into the analysis directly in the di.HContinuous forni 
(tabulated by class intervals) m which they arc usually supplied in a concrete 
case. A numerical example based on population statistics is presented, illustrat¬ 
ing how, using discontinuous data, a recurrent series take.s the place of the integ¬ 
ral equation, and a finite exponential series appears in place of the Heaviside 
expansion of the previous solution. There is close analogy with the procedure 
previously presented, but with factorial moments appearing in place of ordinary 
moments. 

The fundamental data being given for values of the replacement function at 
discrete intervals only, some question arises as to the applicability of tlic solution 
as an “interpolation” formula for non-integral values of the time f, and as to the 
effects of subdividing the class interval of the original data. 

In the actual computation of the factorial moments a shift of origin by one- 
half class interval becomes necessary. An algorithm for elTocting tliis shift la 
presented. 


1. Methodology: Alternatives Available. 


All application of mathematics to concrete situations involves a greater or loss 
degree of conventionalisation, a substitution, "in place of intractable reality, of 
an ideal upon which it is possible to operate.”^ 

This conventionalisation may be only such os to do little violence to the con¬ 
crete data, as for example when, dealing with a large population, wc treat the 
number N (l) of individuals at time t as a continuous variable, knowing perfectly 
well that strictly speaking it varies by jumps of one unit at a time.* 

In dealing with any particular concrete case there may be considerable choice 
as to the mode in which the conventionalisation or idealisation is carried out, 
and the particular place or step in the scheme at which it is introduced. A good 
illustration of this is met in the treatment of renewal theory, as applied to human 
' populations or other biological or industrial aggregates. 

The majority of authors who havo dealt with the subject have set up their 
fundamental equations in terms of continuous variables. Many have gone fur- 


1 Nature, Vol. 110 (1922), p. 764. 

’ population IS subject to extreme variation in numbers, such that N(l) passes 
through small values, this disregard of their discontinuity may not be permissible. 



APPLICATION OF BECUEEENT 6EEIES IN EENEWAL THEOBT 


191 


ther than this in the process of conventionalisation and have assumed for the 
renewal function (net reproductivity) some more or less appropriate mathemat¬ 
ical expression, such as a Charher or a Pearson [1] frequency distribution, and 
have, wherever possible, carried out by standard methods the integrations in¬ 
volved. 

Others, while retaining the formulation of the fundamental equations in con¬ 
tinuous (infinitesimal) form, have made no specific assumptions regarding the 
analytical form of the renewal function, and have carried out the numerical in¬ 
tegration by one of the established methods available for the approximate in¬ 
tegration of arbitrary functions. 

But there has also been a minority of authors who deemed it most appropriate, 
since the data of the problem are actually furnished in tabular (and hence dis¬ 
continuous) form, to apply from the start discontinuous methods in formulating 
the fundamental equation for the problem This equation then defines a recur¬ 
rent series. 

The most recent and also the most concise exposition of this approach to the 
problem is a paper by W. Dobbernack and G. Tietz presented at the Twelfth 
International Congress of Actuaries, 1940, Proceedings, vol. 4, p. 233. These 
author,?, however, do not give any numerical application, and in consequence 
certain aspects of the analysis are not touched upon by them. A more detailed 
presentation, including numerical applications, was given by the late S. D. Wick- 
sell° who, however, used only roughly approximate data (an over-all average net 
reproductivity for ages 20 to 44) and also introduced certain linear interpolations 
which would not be appropriate with more exact data, and which become un¬ 
necessary in the numerical operations if moments are introduced as indicated in 
what follows. 

The purpose of the present paper is to exliibit this modification of the method 
of recurrent series, and at the same time to illustrate its relatioU to the method 
which proceeds in terms of a continuous variable, leading to an integral equation. 

The B(t — a) women born in the calendar year {t — a), that is, between the 
times ii — ^ — Cl) and (< 4- § — o), will be a years old some time during the 
calendar year t, that is, between t — ^ and i If their births were evenly 

distributed over the year t — a, so will their birthdays of age a be over the year 
t, and their average age during that year will be a and the average number of sur¬ 
vivors to that age during the year t will be approximately B(l — a)p{a), where 
p{a) is the probability, at birth, of surviving to age a. If the annual female 
reproductive rate, counting daughters only, is m(a) at age a, then the B{t — a)- 
p(a) survivors will, during the calendar year I, give birth to B(t — a)p(a)m(a) 
daughters. If B(i) is the total number of births of daughters in the calendar 
year t, then evidently, for positive values of t, 

u 

(1) B{t) = ^B{t — a)p(o)m(a). 


• 12]; see also [3]. 



192 


ALITRBD 3 . LOTKA 


or, to simplify the notation, 

(2) Bit) = i cMt - a)- 

Equation (1) or (2) defines a recurrent series of the general form 

(3) B(i) = CtB(i — 1) 4- C2B(1 — 2) + •*■ -f — w), 

where some of the coefficients c may be zero and where w denotes the upper limit 
of the reproductive period. 

The trial substitution 

( 4 ) B(l) = Qx~‘ 
in (3) gives 

(5) 1 = CiX + C2x’‘ + •■ • + ca “■ 

The substitution (4) therefore satisfies (3) provided that»is a root of the equa¬ 
tion (5) of degree w for x; and the same is evidently true for the more general 
substitution 

(6) B(t) = i: 

Jml 


where z/, with j = 1, 2, • • • u, are the w roots of (5). 

Equation (5) leaves the a coefficients Q/ indeterminate. In general they ap¬ 
pear as arbitrary constants. In any concrete application they may bo deter¬ 
mined by “initial” conditions; that is, in order to make the problem determinate, 
it is necessary to be given the values of B(i) for u successive integral values of i, 
or some equivalent data. 

While, for convenience in description, the analysis has been developed in 
terms of the year as time unit, the formulae are evidently independent of this 
choice of unit, provided that the unit employed is adequate for practical appli¬ 
cation. 

Whatever tho unit employed, for the direct application of (1) and (3) to a con¬ 
crete case it is necessary to have the data in such form that values of p(o)m(o) 
are Icnown for integral values of a. The pertinent statistics do not usually come 
in that form, the fertility being usually loiown only for live year ago groups, and 
though it may be sufficient for practical purposes to regard those quinquennial 
values as representing p(a)m(a) for the midpoint of the group, this yields p(a)'- 
m(a) for fractional values of o, as measured in five year units. We may theu 
proceed as follows: putting 


( 7 ) 


a: = 1 -f- y 



APPLICATION OP EBCUREENT SEHIES IN RENEWAL THEORY 


193 


in (5) this becomes 

1 = {Cl + Ca + C 3 + • + Cm} 

+ {Cl + 2 c2 + 3c3 + 


+ s C 2 + 3c 3 + Gcj + 


( 8 ) 


+ < C 3 + 4c4 + ] Ocb + 


+ toc„}2/ 
to(a) - 1) 
21 


J_ Cl'(£ 0 — l)(co — 2) ^ 

+ w; C; 


+ • 

+ {c^}v“ 

h=L} k^<i 3 ~h // I 7 \ 

- Z 2 : (* t ) »‘«!A 

A-O k~a \ n / 

In application to a particular population, we shall usually have the condition 
Ca = 0 for a = i, 2, • • < a 

where a is the lower limit of the reproductive period. 

The expressions in brackets (coefficients of successive powers of y) will be recog' 
nized as cumulations Si, of the values of the function Co, summed backwards to 
the “diagonal’’ element Ci,, where h is the exponent of y.' In terms of moments 
m of the function Ca, equation (8) can be written 

mi — mi 1 m 3 — Sms + 2mi 


(9) 1 = mo + mil/ + 


2 ! 


V‘ + 


3! 


3 I I u 

-V + • +c„v 


or, using the symbol my,] to denote the hth facbrial moment, equation (9) takes 
the simple form 


( 10 ) 


1 = 2: 


A-o hi 


In these expressions the moments m* and my,] are those taken about a = 0. 
Actually, the net reproduction rates are given for “semi-values” of a, that is, for 
values of a which are odd multiples of J (using five years as the time unit). By 
cumulation of these given values moments m( and m'y,] about o = — f are 
obtained,'* ^From the latter the corresponding functions of the moments about 
a = 0 are obtained by the transformation formulae'’ 


( 11 ) 


k—h ( i-vltl 

hi hi kl 


fn.Vh~K\ 

{h - /c) I ’ 


*=/i / i\ [A.1 
C _ V Q' 

- 2-1 —n— ' 

fc_o A. I 


* In these cumulations zero values of Ca for 0 < o < a must not be omitted. 

' In accordance with a customary notation the symbol (—i)l*’l denotes the continued 
product — i (—4 — 1)(—i—2) . | —fc + 1). In the computation of successive terms, 

in the sums in the right-hand member of (11), by appropriately laying out the work, ad¬ 
vantage IS taken of the fact that values of (—i)WA! for /c = 2, 3 . are obtained each 
from the preceding by multiplying successively by f, }, etc , and taking care of the sign, 
so that fractions with complicated numerators and denominators are avoided. 



194 : 


AilTCED I. LOTKA 


It will be recalled that in the treatment of the problem of replacement by 
means of an integral equation,® a solution in the form 

(12) Bit) = SQjaT' = SQ)/'', 


is obtained, in which the ejqjonential coefficients r/ are the roots of the equation 

/fU /•<>) 

(13) ^ / e~''‘p(a)m(a) da = x°p(a)m(a) da, 

Ja Ja 


].e. 


(14) 1 = ma - niir ~ ~ r’‘ + ■ ■ ■ =1^ (-I)' ^/, 

2! 31 h-o a! 


in close analogy to equation (10) for y, with the distinction however that in (10) 
the factorial moments take the place of the ordinary moments of (14), and that 
the series in (10) is finite, terminating at the term in y There is also an impor¬ 
tant difference between the characteristic equation (13) and its analogue (5), 
namely that fa) may admit of negative roots for x, whereas (13) does not admit 
negative values for x. 


2. The constants Q. These are determined by initial conditions, as follows, 
Equation (2) can be written 


1 


(15) 

with 

(16) 


Bit) = 2 (iaBit — o) -f 2 OaBit — a) 

Qaf 0^1 

= Bit) + 2 c«Bii — a), 


Bit) = D c^Bit - a) 
and 

Bit) = 0 


The values of Bit) being given for integral values of t, from t 
< = 0, it can be shown that’ 


(17) . 




Z Bit)x) 


0/Ca j 

anl 


0 < i < u 
t > 0) 

— (w— 1) to 


• For a diacusBion of the limits of applicability of this method See [4], 

' The reasoning is essentially the same as in the treatment of the problem by integral 
equations. See [6] and [2, p 39 et seq.]. ' ® 



APPLICATION OF EECURRENT SERIES IN RENEWAL THEORY 


195 


In the special case that we are tracing the progeny of an initial population all 
bom at the same time, say B(0) births occurring at i = 0, so that 

(18) B(-l) = B(-2) = = 5(-[w - 1]) = 0 

the expression for Qj , in view of (5), reduces to a particularly simple form. 
For if we write the summation in equation (16) in expanded form, we have 


Fd) = 

CiB(O) + C2S(-1) + CaS(-2) + C4B(-3) + ••■ + c„5(-« - 1)^ 


F(2) = 

CiS(O) -j- CjB(—1) -)- CiB(—2) 4- ■ • • 4- cJB{--co — 2), 

(19) F(3) = 

Ca5(0) 4“ CiB( —1) 4" ■ ■ ■ 4" Cu5(—w — 3) 

F(«) = 

c„5(0). 

If now 5(—1), • • • , 

B(— w — 1), all vanish, then 

(20) i FdU 

1 

= 5(0)(cia; 4* ctx^ 4" cs®* 4" • ‘' 4“ 

(21) 

= 5(0) by (6). 

Hence, 


(22) 

g _ 

tiCCgS? 2 

anl 

In particular 


(23) 

5(0) = g Q, = 5(0) g y 

BO that 


(24) 

7=1 ^ jQiCftCCi 

The constant JS(0) here evidently functions essentially as an arbitrary unit 
of annual births, and may with this understanding simply be put = 1, thereby 
simplifying the notation. This has been done in what follows, where con¬ 
venient, especially in the table of constants. Table 3 of the numerical illustration. 

The denominator in (17) or (23) can be evaluated for any root x, of (5) by direct 
summation if the coefficients Ca are given or have been computed (as indicated 
below) for integral values of a; or, in a manner similar to that employed in passing 
from equation (5) to (8), the denominator can be expressed in terms of the cor- 



196 


ALFRED J. LOTKA 


responding roots y, = x, — 1 oi (8) or (10), tho cumulations of c„ being replaced 
by cumulations of ac„. With the denominator so expressed, the constants 
Q, take the form, in obvious analogy to equation (9)‘ 

^25) (7 =_Ir*__ 

■' '»+*!/,+ ri-p 

The alternative procedure, to which reference was made in the preceding para¬ 
graph, is to operate upon the moments mm (taken about the origin 0) by a 
process the inverse of cumulation—^which we might term decumulalion—arkd 
in this way to obtain from them the coefficients Ca - The polynomial 2aso^“ 
can then be evaluated directly. 

The decumulation is readily carried out by an algorithm which suggests itself 
from the schedule of cumulation. Analytically the relation between the two 
processes is expressed by the reciprocal sets of transformation formulae: 
Curmtlalion 

( 20 ) (h + k\ _ 

Decumnlation 

(27) c;, = * (_ 1)* . 

ti ^ ^ \ h /{h + k)\- 


3. Constants Q associated with complex roots x = 

The complex roots x, give rise to oscillatory terms which, in tho special case 
of the progeny of a cohort of £(0) births, take the form® 

25(0) 8~“‘ 

cos VI - H sill Vi ], 

where 

(7 = £ (tCae“““ cos va 

a-l 

and 


fif — C^Cq e 


‘ sin va. 


These constants may be evaluated directly in this form, or, putting y = F. + i„ 



APPLICATION OP KBOUIIEENT SERIES IN RENEWAL THEORY 


197 


(a) Numencal Illustration. For convenience and to furnish the opportunity 
for conaparison, the same data (United States 1920) were here employed as in 
the writer’s earlier publications m which the problem was treated by the appli¬ 
cation of an integral equation. 

(b) Cumulation for values of . The two operations, of (1) cumulating the 
values of c„ given for semi-values of a, and (2) allowing in the cumulated results 

NET FERTILITY 



Fig. 1. Net Fertility 2 )(o)m(a) White females, United States, 1920 

The verticals drawn in full and centered at mid-ages represent the original data; those 
drawn in dashed lines and centered at integial ages are interpolated. 


for a shift of origin from o = — ^ to a = 0, can be conducted m one schedule as 
in Table 1. Cumulation is first carried out in the usual manner from the bottom 
line to the diagonal, with the result appearing immediately below the diagonal. 
From here on the procedure is as in the following example: Starting at the lower 
right hand corner, we find 

.00780 X (-^) = -.00390 

.12770 X (-i) = -.06385 -.06385 X (-f) = .04789 

.97395 X (-^) = - 48698 -.48698 X (-f) = .36254 

36254 X (-^®) = -.30437 








198 


ALFRED 1 . LOTKA 


table 1 


Compuiation schedule for values of = Si of net productivily function p(,a)m(.a) » c, 
for integral values of age a* 


a 10 5-ycar 
units 

Ca 

m[ol 

"•111 


n 1 ) 1 / 31 

fflUI/41 

"»[4|/3! 

(1) 

(2) 

13) 

<•) 

(5) 

(fi) 

(7) 

(«) 



1 16635 

6 64127 

10.64550 

24.34106 

23.10864 

15.06660 




-,68318 

.43738 

- 36448 

.31892 

-.28703 

0-1 

00000 

1 16836 

7 22446 

-3.61223 

2.70017 

-2.25764 

1,97644 

1-2 

00000 

1,16636 

6,05810 

19.82036 

-9 91018 

7 43264 

-6.19387 

2-3 

.00040 

1 16635 

4,89176 

13.78226 

31.00655 

-16.96328 

11.96490 

3-4 

09630 

1,16596 

3.72540 

8.87050 

18,14430 

33.62800 

-16.81400 

4-6 

,31265 

1 06965 

2.65946 

5.14610 

0 27380 

15.48370 

24.41096 

6-6 

31025 

,76710 

1 48980 

2.68666 

4.12870 

6.20990 

8.92726 

6-7 

23170 

44686 

73270 

1.095S5 

1 54306 

2.08120 

2.71736 

7-8 

.15090 

,21515 

28586 

.36315 

.44720 

63816 

.63015 

8-9 

.05705 

06426 

.07070 

,07730 

08405 

.09096 

09800 

9-10 

.00615 

00630 

.00646 

OOOGO 

.00675 

.00090 

.00706 

10-11 

.00016 

,00016 

00015 

.00015 

.00016 

00016 

.00016 









a in 5-year 
units 

mw/OI 

m(T]/7I 

"•lil/SI 

"ipi/OI 

M(lBl /JOI 

m(\i)/lil 

Factor 

(1) 

(9) 

(10) 

(11) 

(12) 

(13) 

(U) 

(1!) 


6 72600 

1,99717 

.36404 

.03483 

.00127 

,00001 



,26311 

- 24432 

,22905 

- .21633 

.20661 

-.19617 

-21/22 

0-1 

-1 77790 

1,62974 

-1 61333 

1.41875 

-1.33903 

1.27293 

-10/20 

1-2 

6 41964 

-4,87768 

4.47121 

-4.16184 

3,80236 

-3.67611 

-17/18 

2-3 

-9 07080 

8,72445 

-7.85201 

7.19768 

-6.68366 

6.26584 

-16/16 

3-4 

12.61050 

-10,60876 

9.19616 

-8.27664 

7.68600 

-7.04414 

-13/14 

A-5 

-12,20545 

9 16411 

-7 62848 

6.67488 

-6 00739 

6 60077 

-11/12 

5-6 

12 3859E 

-6,19298 

4.64474 

-3.87062 

3 38679 

-3.04811 

-9/10 

6-7 

3 4587C 

4 31260 

-2 1663C 

1 61723 

-1.34769 

1 17923 

-7/8 

7-8 

7413E 

,86390 

97396j -.48698 

,36624 

30437 

-6/6 

8-9 

1062C 

1126£ 

.12005 .1277C 

-.06386 

,04789' -3/4 

9-10 

.0072C 

,0073£ 

1 0076o! ,0076£ 

,0078C 

-.00300; -1/2 

10-11 

OOOlf 

,OOOlf 

1 .OOOiej .OOOlf 

,00016 

.000161 

1 


•Figures immediately below the diagonal, obtained by cumulation from the bottom 
upward of the data in Column 2, arc factorial momenta about o = —Figurea in the top 
line are factorial moments about a = 0 For use of factors in the last column sec text. 


The several columns are thus completed, and by addition, in each column, of 
the item immediately below the diagonal, and of all the items above the diag¬ 
onal, the figures in the top line are obtained. These are the coefficients of equa¬ 
tion (10) for y. 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 


199 


(c) Decumulation. While it is not necessary to carry out the decumulation, since 
the entire computation can, if desired, be carried out in terms of y's and m’s, 
there is a considerable interest in noting the values Ca for integral values of a 
which result from the decumulations of the m’s These, together with the 
original values for semi-values of a, are shown in Table 2 and Fig. 1. 


TABLE 2 


Values of Ca = p(a)m(,a) 


(1) for semi-values of a; onginal data. 


(2) for integral values of a; computed hy cumulation of original data, shift of 
origin, and decumulation. 


o 5-year units 


a 5-year units 

Ca 

a 5-year units 

Ca 

0.0 

0 

4.0 

.21781 

8.0 

.10607 

0.5 

0 

4.5 

.31255 

8.5 

,05795 

1.0 

0* 

5.0 

.33400 

9.0 

.02268 

1.5 

0 

5.5 

.31025 

9.5 

.00615 

2.0 

0" 

6.0 

.27427 

10.0 

.00116 

2.5 

00040 

6.5 

23170 

10.5 

.00015 

3.0 

02073-' 

7.0 

.18963 

11.0 

.00001 

3.5 

.09630 

7.5 

.15090 


1 


*The value of ci caitic out negative, namely —.O0S7O, and the value of ci came out 
4-,00014. In the computation of TtacaX^ these two values were arbitrarily adjusted to 

11 


zero, and cj was diminishod from .02118 to 02073 to make the total 
ing only for integral values of a 


2 Ca 
(-1 


116635, aum- 


4. The roots of equations (S) and (8). 

From the prior study already cited, the real positive and three pairs of complex 
roots for r of the characteristic equation 


(31) 



x" p{a)m(a) da 



e ^'‘p{a)m{a) 


da = I 


were known. These were used to indicate the approximate location of the roots 
of (5) or (8), and more exact values were then obtained by Newton’s method of 
successive approximation. Table 3 shows the values of u, v, etc., corresponding 
to the new roots 


y = X - 1 




- 1 




200 


ALFRED J. LOTKA 


obtained through equations (8) or (10), and, for comparison the corresponding 
values obtained in the previous publication from equation (13). The same 
table also exhibits the remaining roots and values of the constants Q, G, II, 


TABLE 3 


Omstanls of the senes solution (.6) of equation (S), corresponding lo the five real and 
three pairs of complex roots of the charactenslic equation (S) 

[Vniied Slates, white females, 1020) 


ConstantsOl 


Five Reel Roots 


Three I’lrirs o( Complex Roots 


A. Coinpuied on basis of recurrent series 


u 

02714* 

-1.764t 

-3.812t 

-17.lt 

-94 3t 

-.10800 

-.44720 

-.47587 

V 

0 

0 . 

0 . 

0 . 

0 , 

1 06498; 

1.57000 

2,40490 

G 

5.64467 

7.73354 

-1266.04 

( 2 ) 

C 2 ) 

5 2S093 

10.45809 

7.73103 

H 

0 . 

0 . 

0 , 

0 

0 

3.03239 

-3 06726 

2.00874 

G/(G‘+m 

.17716 

12931 

-.00080 

( 2 ) 

( 2 ) 

.14241; 

08515 

12117 

11/(G^+H>) 

0 . 

0 

0 . 

0 


.08177! 

j -.02086 

.03148 


B Computed on basis of tniegral equationt 


u 

.02714 





- 1930 

-.43065 

-.4902 

1 ) 

0 





1.0724 

1.6771 

2.44245 

G 

5 64614 





6 16361 

10.22495 

7.40164 

H 

0 . 





2.08767 

1-3.72741; 

3.45312 

G/(G’‘+m) 

17715 



1 


.14626 

.08020: 

.11095 


1“ 

i 



.08420 

-.03135 

1 

I .05175 


t m five year unibs 
® Not oomputsd 

*u„ = log, = -log, .97322 = .02714 
t Values of x 
t See [ 6 , p 8991 


To determine the remaining four roots, tho product of the factors (y — yf) 
{y ~ Vs) {y — Vi) was divided out of the polynomial of equation (10), re¬ 
jecting the remainder and leaving a fourth degree equation 

tf -h 120 2 /' -f- 2590 2 /" + 14617 y -f- 23118 = 0 

In the subsequent work it turned out that the roots of this were all real, and 
they were computed by obvious methods. Thoir values are also shown in 
Table 3 Pot the two numerically largest roots great accuracy was not at¬ 
tempted. They introduce terms with very rapid damping and presumably 
very small values of Q.“ 

“ The divergence is due in part to details of oomputatiou In tho eatlior publication 
the curve of fertility m(a) was smoothed by the method of translation, with a Gaussian 
distribution as basis In the method here presented the raw data were used without 
smoothing, except such as is inherent in the process of the caleulation deaoribed. 
mil “luat bosmall, since Qi 4- . + Q, = 1.00313, and aooording to 

(24), with the convention that S(0) = 1, the sum of all the Q, must be equal to unity. 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 


201 


As a check, m order to be assured that no serious error was introduced in neg¬ 
lecting the remainder after dividing out the factors [y — ?/,) up to {y — yi), 
the product Ulii (y — y^) was computed and, after multiplying by a factor to 
make the absolute terms agree (.16635), was compared with the polynomial of 
(10). As a further indication, the coefficients of the product n were “decumu- 
lated” to obtain values of coefficients of the corresponding polynomial in x, to 

TABLE 4 

11 

Coefficients of Powers of y m Equation (10) and in the Product 11(2/ — y,)] 

1 

Also Coefficients of Powers of x in Equation (S) 


a 

Coefficients of j/“ 

Coefficients of 3 :“ in Equation (5) 
Pound by Deoumulation 

In Equation (10) 

In n(;/ - 1 /,) 

Of Column (2) 

Of Column (3) 

(1) 

(2) 

(3) 

(4) 

(5) 

0 

1G635 

.16635 

-1.00000 

-.99915 

1 

6 64127 

6.64072 

-f 00014'= 

-h 00065 

2 

16 64650 

16.64782 

- .00057* 

- .00432 

3 

24 34106 

24 24197 

.02118* 

02398 

4 

23.16840 

23 18070 

21781 

.21774 

5 

15.05650 

15.07338 

.33400 

.33354 

6 

6.7250 

6.73812 

.27427 

.27474 

7 

1 99717 

2.00316 

.18963 

.18882 

8 

.36404 

36555 

.10607 

.10641 

9 

.03483 

.03501 

.02268 

.02276 

10 

.00127 

00128 

.00116 

.00117 

11 

.00001 

00001 

00001 

.00001 


* In computing the denominator of Q according to (22) the values of the coef¬ 
ficients Cl and C 2 were arbitrarily made zero and the value of C 3 (age 15) was ad- 

11 

justed to 02073 to retain the total 22 c, = 1.16635. 

1 

compare with values of c„ The results are shown in Table 4. In view of the 
fact that the (numerically) highest roots were determined only in first approxi¬ 
mation, the agreement is satisfactory. 

It is to be noted that instead of applying the solution ( 6 ) to compute values of 
B(i), these latter can, of course, also be obtained directly, by carrying forward 
step by step the original recurrent series, or, alternatively, the births in suc¬ 
cessive generations can be computed step by step and the total births obtained 
by addition. The advantage of the solution ( 6 ) is that it enables one, if desired, 
to obtain B(t) for any value of i without having to compute B(t) for all inter- 



202 


ALFHKD J. LOTKA 


vening values of t\ also, the solution in an exponential series gives a better idea 
of the general nature of the process, as well as a direct indication of its asympto¬ 
tic course for large values of t, when the first term ‘ with the positive real 
root xa dominates all others. However this may be, it is interesting to compare 

TABLE 5 


Synopsis of Results of Computation of B(t) as S Qx~‘, Column (5), and as 
Column (9), where B„{l) = Births per Unit of Time in nlh 
Generation at Time t. (Time Unit — B years) 



A = 

Qa"' = x~‘ 

1 

0 


* 

0 ooa vt 

— H ein aij 


G»+ H 

** or u ff 

.97322* 

-1.784* 

1 


- 

,44720# 

-.47687# 

XA 

t 

0 

0 

0 

1.00498 

1.67000 

2.40400 


(1) 

(2) 

(3) 

(4) 

(6) 

(6) 

(7) 

(8) 

0 

17,716 

12,931 

-80 

28,482 


24,234 

100,313 

1 

18,204 

-7,330 

21 

-416 


3,828 

-13,781 

627 

2 

18,704 

4,166 

-6 

-19,498 

- 


3,329 

-274 

3 

19,219 

-2,356 

1 


- 

lira 3 


2,326 

4 

19,748 

1,336 


Hnn'S 


2,844 

-3,362 

21,688 

e 

20,291 

-757 





2,223 


6 

20,850 

429 


HQpit 

- 

■1,162 

-740 

27,470 

7 

21,423 

-243 





-1C9 

19,746 

8 

22,013 

138 




476 

446 

16,823 

9 

22,619 

-78 


-4,294 


■1 

-344 


10 

23,241 

44 


792 


Em 

145 


11 

23,880 

-25 


3,519 


-45 

-1 

27,328 

12 

24,538 

14 


2,265 


79 

-55 

26,841 

13 

25,213 

-8 




18 

61 


14 

25,907 

5 




-32 

-26 

23,878 

15 

26,619 

-3 


-1,188 


-8 

4 

26,424 

16 

27,352 

1 


385 


13 


27.767 

17 

28,105 

-1 



1 

3 

-7 


18 

28,878 

1 


mM 


-6 

4 


19 

29,673 



-261 


-1 

-1 


20 

30,489 





2 

-1 

29,874 

21 

31,328 





1 

1 

31,008 

22 

32,191 



160 


-1 

-1 

32,349 

23 

33,076 



343 




33,419 































APPLICATION OP KECUHRENT SERIES IN RENEWAL THEORY 


203 


TABLE 5 —Continued 



B,(.) 

l 


Generaltons, n 


(1) 

: (2) 

(3) 

(4) 

(6) 

(6) 


(9) 

(10) 

(11) 

(12) 

(13) 

(14) 

(16) 

0 

100,000 







1 

0 







2 

0 







3 

2,072 

2,072 






4 

21,781 

21,781 






5 

33,398 

33,398 






6 

27,472 

27,429 

43 





7 

19,866 

18,963 

£03 

1 




8 

16,735 

10,607 

6,128 





9 

17,954 

2,268 

15,685 

1 




10 

24,033 

116 

23,889 

28 




11 

27,361 

1 

27,022 

338 




12 

26,878 


24,905 

1,973 




13 

24,696 


18,481 

6,214 

1 

1 


14 

23,851 


10,980 

12,858 

13 



15 

25,410 


5,345 

19,941 

124 



16 

27,759 


2,050 

25,030 

679 



17 

29,219 


526 

26,316 

2,377 



18 

29,506 


76 

23,527 

5,897 

6 


19 

29,414-. 


5 

18,092 

11,271 

46 


20 

29,862 



12,041 

17,579 

242 


21 

31,000 



6,906 

23,191 

903 


22 

32,348 



3,381 

26,442 

2,523 

2 

23 

33,423 



1,397 

26,426 

5,583 

17 


the result of the computation by means of the exponential series, carried out as 
set forth above, with the corresponding results of the computation of births in 
successive generations. This comparison is exhibited in Table 5. 

It will be seen that the agieement is good except for the second to fourth 
items, where perhaps the omission of the terms contributed by the numerically 
highest roots makes itself felt. 

5. Discussion. 

(a) The real roots of the characteristic equation (5). It can be shown [8] that only 
one of the real roots for x can be positive, and that the absolute value of any 
other root must be greater than the positive real root 




204 


ALFRED J. LOTKA 


The negative real roots which make their appearance in. the numerical 
example call for special comment. Practically, the “higher” negative roots are 
of little importance, at any rate in this example—^first because the constants Q 
with which they are associated ai’e relatively small, second because large absolute 
values of negative roots imply rapid damping, so that corresponding terms Qx~‘ 
very soon become negligible as (increases Thirdly, the determination of these 
roots would be subject to a wide range of uncertainty, corresponding to the large 
percentage fluctuations or errors of determination of the values of the functions 
p(a)m(a) = Ca at the upper end of the reproductive period. 

But in theory these negative real roots suggest soras pertinent questions. 
One wonders what would happen to them if the data were given, say, for single 
years of age, instead of 5-year groups. Instead of an equation of eleventh degree 
we would then have one of 55th degree. Furthermore, in those cases in ivhich it 
may be permissible to pass to the limit, so that an integral equation takes the 
place of (2), negative roots for x would seem to be excluded as they would make 
the integral in (13) meaningless. 

A problem of perhaps little practical importance but of some theoretical in¬ 
terest may arise here, to which reference has also been made by P.H. Leslie in a 
recent article in Bionctrika,^'^ in connection with a different procedure. 


(b) Effect of finer subdivisions of histogram of p(a)m(a). The effect of this on 
equation (5) for x is not obvious at sight, since new oocllicients would bo in¬ 
serted between previous terms. The effect is more oasily understood from tv ooii- 
sideration of equation (8) for y, Hero finer subdivisions would introduce new 
terms only beyond the last term originally present. The original terras would 
not he changed at all in form, and those involving only lower momenta would 
be changed but little m numerical value, provided that the original histogram wore 
not so coarse as to give inappropriate values oven for these lower moments, 

■ The result, then, of finer subdivision of the histogram, would be to change the 
computed values of the lower roots only in minor degree. But the four negative 
real roots, depending m considerable measure on the higher terms of (5) or (8), 
would presumably be materially altered, and might perhaps give place to further 
complex roots In any case they would be followed by now roots oven more 
1 emote from practical significance than the original eleven. 


ic) m result as m interpolation formula. .Strictly speaking, the solution (0) 
of (2) is applicable only for integral values of t. In particular, terms arising 
out of the nega,tivc real roots of (5) for a: are obviously not adapted to furnish' 
inteipolated values of -8(0 for fractional values of t, since fractional powers of 

“■ summary and analyais of Lcaho’s paper 191 aoo a I'oviow 
p j . P ati n Waves, o/Burma KesearcliSoc , Vol. 31 (1941), Part I, 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 


205 


negative quantities in general are complex. Over the range of t where the first 
real root together with the three parts of complex roots adequately describe the 
process under discussion, these terms alone are, in this sense and to this extent, 
suitable for interpolation, disregarding the terms corresponding to the other nega¬ 
tive roots. 

Even less suitable for interpolation purposes, it would seem, would be terms 
arising from further negative roots that might be introduced by a finer sub¬ 
division of the histogram of original data. If we suppose this subdivision carried 
to great lengths, and if negative roots still appeared under these circumstances, 
they would give rise to rapidly oscillating positive and negative terms for even 
and odd integral values of t respectively (the time unit now being a subdivision 
of the original time unit) with no appropriate interpolation between these 
integral values 

One further point calls for comment. In the process of idealization of the 
problem discussed, it has been assumed that p(o) and m(a) are independent of 
time, and the conclusions reached must be construed in the light of this assump¬ 
tion In itself this would hardly call for comment, as it is a matter of common 
understanding. But the question does arise whether the assumption itself is 
free from implied internal contradictions. 

In a recent publication, P. K. Whclpton^® has drawn attention to the fact that 
in times of rapid changes in the birth rate, the assumption of age specific fertilities 
being held constant at the values observed in a given calendar year may imply 
that some of the women had more than one first child, a logical impossibility. 

The data used in the present numerical example are derived from a period of 
relatively undisturbed birth rate (1920), and do not involve any such conflict. 
But, in the light of Whelpton’s contribution one may ask the broader question 
whether the computation of an intrinsic rate of natural increase and related 
parameters based on age specific fertility as observed in one calendar year 
retain any practical value at all 

In answering this question, two consideiations will be weighed, First, that 
ordinarily the rates computed in the usual way differ but little from those ob¬ 
tained by taking into account order of biith as in Whelpton’s procedure. Sec¬ 
ondly, that the computation using over-all values of ?r(o) for all orders of birth 
combined is a relatively simple matter based on data commonly available; 
whereas the more complete treatment of the problem taking into account order 
of biith is considerably more complicated and often not possible at all for lack 
of detailed data. 

REFFllKNCIRS 

[1] R (I, 11hodic,s, "Populalion malliemaLics,”/ioy Siat Soc ./ou?'., (1940), pp, 220, 232 

[2] B D. WicKSEr.c, "lllchaK till den formclla hefolkningstconc'n,” A'tolso/ro7iomis/c Jhda- 

skrijt, Heft 1-2 (1934), p 1 

I'' See [111 Another icfincmont recently intioduccd into the measurement of repro- 
duotivily IS to take into account duration of marriage Sec Colin Claik and B E. Dyne, 
Economic Record (Australia), .June 1946, p. 23 



206 


AXiFBED J. lOTKA 


[3] E. J, Gumbel, “Eine Darstellung atatistischor Reihen durch Euler,” Jahresbcr. der 

Deuischen Malhemaliker Vereimgnng, Vol. 25, p. 257. 

[4] W. FELLEii,(‘‘On the integral equation of renewal theoiy,” Annah of Malh, HlaL., Vol. 

12 (1941), p. 243 

[5] A. J. Lotka, “Contribution to the theory of self-renewing aggregates with special refer¬ 

ence to industrial replaoomenl,” Annah of Math. Slat., Vol. 10 (1039), p. 11, 

[6] A. J. Lotka, “The progeny of a population element,” Am. Jour, of Ilygtene, Vol. 8 

(1928), p 892. 

[7] A. J. iiOTKA, “The progeny of an entire population,” Annals of Math, Utat., Vol. 13 

(1942), pp 117-118. 

[8] W. Dobbernack and G. Tibtz, Twelfth International Congress of Actuaries, Vol 4 

(1940), p. 233 

[9] P. H Leslie, “On the use of matrices in certain population mathematics,” Biometrika, 

Vol. 33 (1945), pp 202and206. 

[10] E. G. Lewis, “On the generation and growth of a population,” Sankhya, Vol. 6 (1942), 

Part I, p. 93 et seq. 

[11] P. K. Whelpton, “Reproduction rates adjusted for ago, parity, fecundity, and mar¬ 

riage, Am. Slat. Assn Jour., Vol. 41 (1946), p. 601. 



SOLUTION OF EQUATIONS BY INTERPOLATION 

W. M. Kincaid 
University of Michigan 

Introduction and summary. The present paper deals with the numerical 
solution of equations by the combined use of Newton’s method and inverse in¬ 
terpolation. In Part I the case of one equation in one unknown is discussed. 
The methods described here were developed by Aitken [1] and Neville [2], but 
do not seem as widely luiown as they should be, perhaps because the original 
papers are not readily available. (A short summary of Aitken’s work will be 
found in a recent paper by Womersley [3].) Mention should also be made of 
an interesting paper by Spoerl [4], which treats the same problem from a some¬ 
what different viewpoint. 

In Part II these methods are extended to sets of simultaneous equations 


PART I. EQUATIONS IN ONE UNKNOWN 

1. Nature of the problem. We first consider the problem of locating; to any 
desired degree of accuracy, a real root xo of an equation of the form 

( 1 ) yix) = 0 

where y{x) is assumed to be analytic m an interval containing the root in ques¬ 
tion. Since we shall not be concerned here with the necessary preliminary work 
of separating the roots, etc., we may suppose that .To is known to lie within a 
given interval that contains no zeros of y'(x). (Multiple roots are thus ex¬ 
cluded; but of course any such root is a simple root of an equation obtained from 
(1) by differentiation, and the methods described below can be applied to this 
equation.) 


2. Aitken’s m.ethod of interpolation. The method to be described, which 
may be regarded as a generalization of Newton’s, depends on the use of inverse 
interpolation. It is therefore desirable to recall a few points from the theory of 
interpolation before proceeding further. 

Let / be a function such that f{t) is kno^vn for i = , • ■ • , . Then the 

Lagrange interpolating polynomial /i2...n(0 is defined by 


iw-nfS) — iik) 


( 2 ) 


+ /fe) 


{i - - 4 ) 

(k ~ k) (k ~ &) • • • (k — in) 

(t - in) 


(fs — k) ik ~ 4) • ■ ■ {k ~ in) 

it - k){t- k)---it- tn-t) 


+ • ••+/({«) 


[in tl) [in i^ • • ' [in in- 1) 


207 



208 


W M. KINCAID 


We note that 

Mt) = jik) + m 

ll — t2 ^2 — 4 


/(h) i - h I 

'hi 

h ~ h 


(3) 


fnaii) ~ 


In{t) 1 - U 

\ 

^ fis;i .ri-i(0 1 ~ k 

/2.(0 t - U 

I) 

/21..,.(0^ i-h. 

h “ k 

k ~~ t„ 


so that /i 23 ,. n(i) can be evaluated for any given value /o of t by a succession of 
linear interpolations. It is convenient to arrange the work in a table like the 
following (n = 4): 

TABLE la 


t 

m 

1 

n 


1 Paris 

k 

fik) 

j 

fliik) 



i lo~k 

k 

fik) 

fnih) 

fmiU) 

fmiiU) \ 

U-'k 

k 

fik) 

Mu) 

\ fm{U) 

1 

1 1 

fo-lj 

U 

fik) 

1 

i 


U 1.1 


This form is well adapted for machine computation, for each denominator 
U ~ tj = [U ~ I,) - (/o - h) automatically appears in one set of counters when 
the corresponding numerator is obtained in the other 
11 S'{t) is known at one or more of the given points, this information is readily 
fitted into the scheme. For we see that 

(4) /h(i) ^ lira Snii) = /(h) + (J - h)/'(h) 

and all that is necessary is to repeat certain entries in Table la and to fill in col¬ 
umn I by using (4) as indicated in Table Ib. The extension to higher deriva¬ 
tives is obvious 


TABLE Ib 


t 

m 

I 

II 

HI 

Parts 

u 

fik) 




U'~k 



fiiiU) 




m 


fmiU) 

1 

Mk 

u 

fik) 

fiiiU) 


fimiU) 



fm(U) 


torti 

u 

m 

Mu) 

1 

fmiiU) 



/283(to) 

Lo-la 



Mu) 





m 




ttrk 




SOLUTION OF EQUATIONS 


209 


In applying the above to obtaining the root xa of (1), ive must suppose that 
y{x) IS tabulated or can be computed for a set of values of x in the neighborhood 
of Xa. What wc do not know is the value of x corresponding to i/ = 0. It is 
therefore convenient to regard x as a function of y whose value is loiown at cer¬ 
tain points and then interpolate to get Xa = x(0). That is, we let y take the place 
of t and X that of f{t) in the preceding discussion, while 0 replaces k . The work 
is slightly simplified by the fact that the column of “parts” becomes identical 
with the left-hand column which contains the y’s and can therefore be omitted. 


3. Application to an example. The procedure will be most clearly indicated 
by an example. Consider the equation 

(5) V = X* + ‘lx - 5a;" - 8a; + 1 = 0, 


which has a root between 0 and 1. (If the root were located elsewhere, it would 
be desirable to shift it to this interval in order to simplify the computation of y.) 

The work of evaluating this root to ten places is summarized in Table II, and 
explained below. In the first column, the numbers in parentheses arc values 
d/V 

of ^, and the other numbers are values of y, corresponding to the values of x 
in the second column. 


TABLE II 


y 

X 

/ 

n 

in 

1.000 000 000 000 
(-8 000 000 000) 
0.162 100 000 000 
-0.001 054 385 279 
(-9 081 459 548) 

0 008 022 865 936 
(-9 073 020 416) 

0 000 
0,000 

0 100 

0 117 

0 117 

0 116 
0,116 

0 125 OOO 000 00 

0 117 938 436 13 

6 882 964 17 

3 896 94 

3 842 98 

4 254 15 

0 116 671 702 00 
884 075 87 

3 890 62 
90 67 
90 74 

0 110 883 877 01 
00 68 
90 68 


Xo = 0 lie 883 890 7 


The procedure is as follows. Taking a; = 0 as a first approximation to xq , 
we find that y{0) — 1, y'{0) = — 8, and record this data in the y and x columns 
of the table. Note that for convenience, the value of i/'(0) takes the place of a 
blank entry in Table Ib. We now apply (4), which here takes the form 


(6) a;u(0) = x (0 


.. dx 


= 0 + 


-1 


ii/=i 


dy 


dx 

XtaQ 


= 0 + 


= 0.125 


and enter the result in column I. Note that this is equivalent to one step of 
Newton’s method. 

In view of (6) we take a; = 0.1 for our next approximation and apply (3) to 
obtain the second entry in column I and the first in column II. This last sug¬ 
gests X = 0.117 for our next trial value. (We do not compute y'{0.l), as little 




'210 


Vr. M. KINCAID 


Vould be gained by doing so, and (.ho time is better spent in going ahead as 
indicated.) Finding 1/(0 117) and filling m the table gives us the root to six 
places. 

4. Eniployment of tables. Contmuing in the same line, it would seem nat¬ 
ural to take X = 0.116884- at the next step, and doing so would lead to the most 
rapid convergence. But another consideration enters. Up to thl.s point the 
values of y were computed with the aid of the WPA Table of Powers, which is 
limited to three places in the argument. Ratlujr than going to the extra labor 
of evaluating j/(.116884), we proceed as indicated in the table, using 
7/(.116) and y'(.116), and stopping when the values of x in the last column 
agree to the desired number of places 

This point has been dwelt on because it is likely to arise whenever tables are 
used m evaluating y{x) In the example just given, to be sure, we had a certain 
freedom of choice; but if y{x) is not algebraic, direct computation may be quite 
unpractical. It may be noted that in such cases the method of inverse inter¬ 
polation is not only faster than the simple Newton’s method but is capable of 
giving more accurate results. 

The error in the final result can be estimated from the standard fonnula for 
the error of interpolation, but this may be awkward because it requires the 
evaluation of higher derivatives of x with respect to y. In practice it is generally 
safe to rely on agreement of different interpolated values, and of course the result 
may be checked by substitution in the original equation. One simple peiiit is 
worth noting, however—if the error in the original column of .t’s is 0(6), that in 
the successive columns to the right is O(e'), 0(6®), etc. 

5. Applicability of the method. Althougli the example we have presented is 
algebraic, the method is, of course, equally applicable to transcendental equa¬ 
tions. Moreover, it can be used, theoretically at least, to yield complex as well 
as real roots. The sole difficulty is that the numerical work becomes cumber¬ 
some m this case, how serious it is depends on the type of computing machines 
used. If the equation is algebraic, Bernoulli’s [5], [6] and Graeffe’s (7] methods 
are applicable In fact, they are likely to be the most effective since they do not 
require prior knowledge of a first approximation to the root. If the alternative 
procedure of replacing the equation by two simultaneous equations for the real 
and imaginary parts of the root is decided upon, the methods described in the 
next section may prove useful. 

PART 11. SETS OF SIMULTANEOUS EQUATIONS 

6. Two equations; general considerations. It is natural to take up next the 
problem of finding the simultaneous solutions of two equations in two unknowns. 
Let these equations be 

v) = 0, v{x, y) = 0, 
where w and ti are analytic functions of s and y. 



SOLtTTION OF EQUATIONB 


211 


If we had a general method of interpolation of functions of two independent 
variables, the problem could be solved in a fashion similar to that used in the 
preceding section. That is, u and v would be computed for values of x and y 
near the desired ones; then x and y would be regarded as functions of u and v 
and interpolations would be performed to obtain the values corresponding to 
u = t) = 0. 


It is easy to set up interpolating functions in a variety of ways, but the author 
has found none that are satisfactory for the problem in hand. Note that what is 
required is to determine the value of a function at any point in the plane, given 
its values at a set of fixed points The most obvious idea is to use polynomials of 
the least possible degree for this purpose, as is done in the case of a single variable. 
In this case, however, the coefficients of a polynomial of the nth degree are de¬ 
termined by its values not at n + 1 but at fa ft* points; thus if a func- 


tion is given at 5 points, no unique quadratic interpolating polynomial can be 
constructed. What is worse, even if a function is given at 6 points, say, the 
quadratic pol 3 momial determined will in general have large coefficients and take 
on unreasonable values if all the points happen to lie close to a common conic. 
Other schemes considered by the author have similar drawbacks, though the 
possibility of course remains of finding a suitable one by further research. 

The problem can also be handled, at least in principle, by eliminating one of the 
variables; but, apart from the difficulty of carrying this out in practice, the 
resulting single equation is likely to be more complicated in form than the original 
two. If so, solving it may require more computation than would be involved in 
attacking the original equations directly by the methods described below. So 
far is this true that even when a single equation is given in the first place it may 
be advantageous to replace it by a .set of simpler equations. 


7. Newton’s Method. Although a direct extension of the method of inverse 
interpolation is not presently available, Newton’s method may be suitably 
generalized for this case. 

Starting with equations (7), we set up the auxiliary variables 
(8) X = uv^ — vuy, Y = uvx — vux, 


where the subscripts denote partial derivatives; m, = ~ == — , etc. 

ax dy 

We have 


( 9 ) 


dx , ax 

r W* Vy Vx I" UVx]/ ViVyy • 

OX ay 

dY dY 

— = UVix — VUxx , IT" *^1/ ~ • 

dx ay 



212 


-w. M. ICINCAID 


For u = 0,v = Q, equations (8), (9) reduce to 


( 10 ) 


X = Y 


dX ^ dY 
dy dx 


dX 

dx 


dy 




■where J is the Jacobian of u and v with respect to x and y. 

Equations (10) will hold approximately for values of .r and y near those satis¬ 
fying equations (7). That is, in the neighborhood of a solution X can be re¬ 
garded as a function of x alone and F as a function of y alone. Then if x = ajo, 
y = ya is the desired solution, (xt, yi) is a point in its neighborhood, and xi = 
J(xi, yd, Yi = 7(xi, yi), Ji = J(xi , yi), we have 


( 11 ) 


1‘0 ' 




Jl ’ 


yo 2/1 + 


Fi 

./i 


Also if (xi, yd is another point near (xo, 2/o), 


( 12 ) 


Xo ~ 


Xi Xi — .Xs Xi 
“X2 - Xi ■ 


yo 


2/1 F2 - 2/2 Fi 

F2 - Fi 


Relations (11) and (12) can be used to obtain successive approximations to the 
solution. Use of these relations corresponds to employing Newton's method 
and linear interpolation for the solution of one equation in one unkno^vn. 

As a first example we consider the equations 

u^x^ + xy + y^ — ^ = Q 
(13) 

V = xy + y —1 = 0. 


We have 


(14) 


= 2x -f y u„ = X + 2y 
Vt = 2xy v„ = x^ d- 2y. 


Drawing a rough graph mdicates a solution near (1, 1). We evaluate u, v, etc., 
at this point as shown in Table III. Using (11) we get (2,0) for a second approxi¬ 
mation, and proceed as before. We can now use both (12) and (11) to get new 
approximations; they are (1.33, 0.57) and (1.25, 0.50), and are entered in the 
last two columns of the table We therefore try (1.3, 0.5) next, and continue 
in this fashion until the desired accuracy is attained. Both (11) and (12) are 
used at each step and the values obtained are entered m the last two columns. 
The entries in the numbered rows are obtained by using (11), the others by using 
(12). The number of places to take in each succeeding step is judged from the 
agreement shown. 

Table IV indicates the process of finding a second solution of (11) by the same 
method. The convergence is very rapid in this case, mainly because the first 
^ess is fairly close. 



SOLUTION OF equations 


213 


54 

tt 

3.0 

4 0 

2.69 

3.05 

3.0201 

3.04159044 

3 0416879134 

1“ 

3.0 

2.0 

2.3 

2.3 

2 25 

2 2638 

2 26382752 

H 

C> 

1 

2.0 

0 0 

1 3 

1.2 

1 1174 
' 1.13535 
1.1353653084 


3 0 

4 0 

3 1 

3 4 

3 39 

3.4026 

3.40266551 

1 

§ 

05 

05 

« S M 

2 ^3 'O 

00 o 

S CD S ^ 

o o o o o o o 

"^7 Ilf 


05 

SI 

05 

CO 05 
lO 

°o 

£2 o 05 

^ 1-1 ^ ’Ts ^ 

O O O o o o 

d rM 1 ■ 1 l' f 

?> 

1.0 

0.0 

0 5 

0 4 

0.37 

0 3750 

0 37499651 

» 

1.0 

2 0 

1 3 

1.5 

1 51 

1 5138 

1 5138345 


I-H C<I CO ^ 50 CO t* 


a> to 

f:: g S 

CO CO CO CO 
^^4 CO OJ 05 
O O 05 05 

05 JJS 25 ^ 

^ 

CO CO CO CO CO CO 


I> l>- 
CO CO 


05 05 
05 

CO 05 CO CO 
rHt^C5000 
<D b- rf CO CO CO 

05 »H rH i—H rH 

50 lO ui io io ^ 


s 



o 

CO 


o 


IC 

Iv 

s 




8 

<D 

o 


»o 

!>. 

o 


«S 

lO 

g 

s 

oo 

o 






o 

CO 


o 

cd 


05 

CO 


CO 

t>. 


05 

oo 

05 

g 

t^ 




1 5138345184093 
0.3749965131978 




214 


W. M. KINCAID 


oa ^ 

CO 

OS oo 
^ LO to 
O 1 —( o o 
th C'l c<i 

\ \ i 


S ^ S 

Cm l“H 

up lO LO lO 
CO CO CO p5 

S S?i SS w 


CO CO CO CO 

1111 


o 

h- S & Eo oo 

<£> Os CS Os OS 

«0 eo CO C£» UP 


I I i i I 


io 

o 

Ci r-4 

o t>- co CO 

CO oi C<| 

till 


a ss t 


CO F*. 
CO 

a a 

os i-H 

S? “a 

O CP o o 


R S 

04 =a, 

o o o o 


_ 1^ J-^ 

o CO C<l 


I I I 


os oo 

^ ^ OS OS 
O CO CO 


CQ ^ 




SOLTJTION OF EQUATIONS 


216 


8. Inverse interpolation. In the preceding section, attention vras drawn to 
the difficulty that may arise when tables, necessarily limited to a certain number 
of places in the argument, are used in the computation. In the example just 
discussed the values of u and v were easily computed directly to the number of 
places wanted. But a glance at the work will show that if we had been limited in 
computing u and v to values of a; and ij having, say, two decimal places, the solu¬ 
tions could have been carried to four places only. 

The device adopted in the preceding section was to use quadratic and cubic 
interpolates to secure greater accuracy, and it might occur to us to try the same 
idea here. But for such an interpolation to be strictly valid, equations (10) 
would have to hold identically. Sjnee they hold only approximately, an error 
is introduced which, in general, is of the same order of magnitude as the error in 
linear interpolation. Thus continuing the interpolation would not improve the 
results 

However, this very situation suggests a way out. For suppose we give x a 
constant value Xi , and compute X and Y for a number of values oiy.' For x = 
Xi , both X and Y can be regarded as functions of y alone, or we can regard X 
and y as functions of F. Doing so, we can interpolate to any number of stages 
to find values of X and y corresponding to F = 0; call these Xx,y\. Assigning 
X other constant values Xi, , • • • , Xm y repeat the process, getting a set 
of values Zj, • ■ • and 2/a, • • • , j/m , all corresponding to F = 0. Now 
along tlie cuiwe F = 0 we can regard x and y as functions of X; performing one 
more interpolation, we obtain the desired values of x, y coiresponding to Z = 
F = 0. The error in the final result can be estimated from the errors in the 
interpolations, and is of the same order of magnitude as the greatest of these. 

It will be noted that we did not refer to the definitions of Z and F in describing 
this procedure. Any pair of independent (analytic) functions X' and Y' having 
the property that Z' = F' = 0 when u = v — 0 could be used. However, it 


is convenient to choose them so that 
simplest course is to set 


dX' , dY' ,, 

and —- are small 
ay dx 


Probably the 


X' = aiu + biv, Y' = chu -1- hv, 


where ai, oj, f>i, 6a aie constants such that 

<Xi V„ fla Vr 

— j — __ 

6l ll„ hi U,; 

Let us apply this procedure to the example we have already worked (Table III). 
Suppose we wisli to use values of x and y having not more than two decimal 
places. Within this restriction, we can still carry through the first few steps 
indicated in Table III to ascertam that xo ~ l.Sli, yo ~ 0.375 where (xa , 2 / 0 ) 
is the desired solution. At the point (1.51, 0.37) we have 


Z = 3.0201ti - 2.25i;, F = 1.1174u - S.SOr 



216 


W. M. KINCAID 


CO 
Oi Oi 

o> 

Oi 

to 

CO 


(31 O 1-4 

CO to CO 
CA »-• 
lo c3> <35 

^ S 

<35 Cq 
O O 




CO 


§ 

J>* 


ss s 

so o 


g 

eo 


05 cO 

*«t< VO CD 
05 05 CO 
•«ti t—( 

s i g 

cq S w 

s 


TP 


CO 


Sq 


ii 


Tp 

t— 

CO 


^ a § 

to CO oo 

^ 

1-H OQ *—j 

s s s 


c3 

S3 

O) 



5 


o o o o 


g 


I I 


CO r- 00 05 
CO CO CO CO 

(C> o o o 


05 o 5f3 
rH ^ Cq 
o «co eo o 

*-f O O I-H 


eo I'- TO C5 

CO CO CO CO 

o o o o 


eo l>. TO 05 
CO CO eo eo 

o o o o 



I I 



CO CO tH 

i s i 

rH rH rH 

I r I I 


OO ^ Tp l^ 
^3 

^ CC CO CO 
OO OO 05 Q5 
CO CO CO CO 

o o o o 

r r r i' 


eo W 
eo ^4 CO 

to VO 
CO cq »--( 
CO ^ CO 

o o o 



tP CO CO eo 

S ^ S g 

o o o o 


I i I 


s 


OO 05 

>o S S3 

05 05 o »—( 
Tp 1-H »-H 

o o o o 


eo cq cq CO 

VO VO VO lO 

eo (N CO I—I 

cc UO cq CO 

S r-J cq VO 

O O O 


CO CO cq 1—« 
CO ^ VO 
CO O 05 <0 

ggii 


I I 


tH 1 -H eo 05 
o cQ VO cq 

TO VO CO I-H 

o o o o 


I I 1 I 


CO b- t-h 

tJ 4 I—( 1-H 

cq o M 
o o o 


1 I 


o o o o 


o o o o 


g 


!>. OO 05 
CO CO CO 

o o o 


o o o o 







406 -.1407004470.9 

406 6000000 0 -.1406252237.1 

404 6995581.7 47792 5 - 1406250024.7 


SOLUTION OP EQUATIONS 


217 



tH 00 C? 
to »H 
CO O 9> 

Tf rH 
N. CO QO 


|sS 

cq CO o 

03 O rH 
M CO 
CO CO CO 
O I—I 


(N ^ 00 
00 ’H 00 
Tt< -tJC 
03 Q 03 
« So W 


sss 

CD i-H CD 
CO CO 
CO l> CD 
CO 


00 ^ 

O tH ^ 
rH 00 CO CO 
00 00 03 O 
CO CO CO CO 

o o o o 


CO CO CO 

b. lO uo o 

S CO cq 1-C 
CD CO CO 

o o o c> 


OO r-t Tt< 1> 
N Q bw 
(N CO 05 CO 
00 cO ^ CO 
^ CO CD CO 


O tN. 05 

f CO « 
(75 03 CO 
05 C5 05 

t> lb> ^ 
CO CO CO CO 


(M05^t>-. OOCOCOOO 

O500(Ni-< COCOWcO 

ococoo ococoo 

iHOOiH 1-HOOtH 


CO (30 »H 

ID »D 
O CO CO o 
T-< O O ^ 


03 1> • 

T« CO fc; »D 
N O 05 

CM »0 
b* 03 
»D b- »D 
CM CO CM r-l 
CD O O b** 
Q 05 CO »D 
^ CO CD CO 
tH O O T-i 



.1406250024.7 1.50 

.03908741039.0 1 51 1.5138495506 5 

.06302572977.3 1.52 278631 3 1.5138345680.7 

.1657140545.4 1.53 624787.5 44615.3 1 5138345191.9 








218 


W. M. KINCAID 


Noting the ratios of the coefficients of u and v, we select 
X' = iu — 3w, Y' = u - 3v. 

Next we evaluate X' and Y' for the 16 points having .c-coordinates 1,60, 
1.51, 1.52, 1 ,53 and y-coordinatea 0 36, 0.37, 0.38, 0.39, as shown in Table V. 
Starting with the four points for which a: = 1.50, wo interpolate to find the values 
of 2/ and X' corresponding to Y' = 0 ; they are j/i = .3750000007, X[ — —.1406- 
250025. We proceed in the same way with the points corresponding to the other 
values of x-, the results, as shoivn, are = .3749981706, X'l = — .03908741039; 
2/3 = .3749927660, X 3 = .06302572977; yi = .3749839124, X'i => .1657149545. 
(The extra digits given in Table V are to take care of rounding-off.) Finally, 
using these values, we interpolate to find the values of x and y corresponding to 
X' = 0, and get 

X = 1.5138345192, y = .3749965140 

Comparing these results with those obtained earlier, we see tliat they are in error 
by about 1 unit in the ninth place; a distinct improvement over the four correct 
places that could have been secured without using this device. Note that if we 
had not had our earlier results for compai'ison, a check could have been obtained 
by carrying through the interpolation in the reverse order; i.e., starting with 
fixed values of y and finding values of x and Y' corresponding to X' = 0. 

As in the case of one equation in one unlaiown, derivatives could be brought 
into the interpolation scheme, permitting greater accuracy with fewer points, 

But the derivatives needed would be ^,eto., and the general 

setup w'ould he rather awlcward, so that extra labor would probably be required. 

9. Three or more equations. The methods discussed in this section are 
readily extended to the solution of three or more simultaneous equations in an 
equal number of unlmowns. For example, if we are given three equations of the 
form 

u{x, y, z) = 0, v{x, y, z) = 0, vi{x, y, 2 ) = 0, 
we define new variables 


u r tu 


Uz, Vz: Wx 


Ux Vx U'x 

% Wy 

, Y - 

W d U’ 


Uy Vy U-y 

Uz Vz Wi 


u, V, IV, 


U V w 


which are analogous to the X and Y of (8) ; from this point on the work is practi¬ 
cally the same as before, 

REPERENClfiS 

[1] A. C. Aitken, On interpolation by iteration of proportional parts, without the use 
of differences,” Edint Math.Soc Proc,, ser 2, Vol. 3 (1932), pp, 56-76. 



SOLUTION OP EQUATIONS 


219 


[2] E. H NiaviLM, “Iterative interpolation,”/JKlian ilfoiA. ;iSoc, Jour ,YcA 20 (1934), pp 

87-120. 

[3] J R Womehslbyj “isoientifio computing in Great Britain,” Mathematical Tables and 

aids to Compulation, Vol. 2 (1946), pp 110-117, 

[4] C. A. Spoebl, "Solving equations m the machine ago,” Amer Inst. Acluar. Record, 

Vol. 31, Part I (1942), pp, 129-149 

[6] T. 0. Put, “Some numerical methods for locating roots of polynomials,” Quart. Appl. 
Math., Vol 3 (1945), pp, 89-105. 

[6] W. M Kincaid, “Numerical methods for finding loots and vectois of matrices," Quart, 

Appl. Math., Yol S(1947),pp 320-345 

[7] E Bodewiq, “On Graelfe’s method for solving algebraic equations,” Quart. Appl. 

Math. Vol. 4 (1946), pp 177-190. 



ESTIMATION OF A PARAMETER WHEN THE NUMBER OF 
UNKNOWN PARAMETERS INCREASES INDEFINITELY WITH 
THE NUMBER OF OBSERVATIONS 

By Abraham Wald 
Columbia University 

Summary. Necessary and sufEcient conditions aie given for the existence 
of a uniformly consistent estimate of an unlmown parameter 0 when the succes¬ 
sive observations are not necessarily independent and the number of unknown 
parameters involved in the joint distribution of the observations increases in¬ 
definitely with the number of observations. In analogy with R A. Fisher’s 
infoimation function, the amount of information contained in the first n observa¬ 
tions regarding d is defined A sufficient condition for the non-existence of a 
uniformly consistent estimate of 0 is given in section 3 in terms of the information 
function. Section 4 gives a simplified expression for the amount of information 
when the successive observations arc independent 

2. Introduction. J. Neyman has recently treated the following estimation 
problem^ Let Xi, Za, • ■ , etc, be a sequence of independent chance variables 
the distribution of each of which depends on some unknown parameters. Two 
kinds of parameters are distinguished, structural and incidental parameters. A 
parameter d is called structural if there exists an infinite subsequence of the 
sequence (Z;) such that the distribution of each of the chance variables in the 
subsequence depends on 9 Any parameter which is not structural is called 
incidental. Neyman has considered the case when there are a finite number of 
structural parameters, say 9i, 6, and an infinite sequence (^i), (i = 1, 2, 

• • , ad inf.), of incidental parameters. He has studied the problem of consistent 
and efficient estimation of the structural parameters and has obtained several 
interesting results. He has shown, among others, that the maximum lilcelihood 
estimate of a structural parameter 9 need not be consistent, even when consistent 
estimates of 9 exist. Neyman has also given a method for obtaining consistent 
estimates of the structural parameters. This method, however, is applicable 
only under certain restrictive conditions. 

In this paper we shall consider a more general case than that treated by Ney¬ 
man, but we shall concentrate on one aspect of the problem, namely that of the 
existence of consistent estimates, 

Let {Z,}, (i = 1, 2, • • ■ , ad inf.), be a sequence of chance variables, not 
necessarily independent of each other It is assumed that for each n the chance 
variables Zi , • ■ , Z„ admit a joint probability density function 
Pn{xi , • • , !tn I 0, Cl, ■ ■ • , C„) where 0, Ci i C 2 > • • , etc. are unlcnown parameters.’* 

^ ‘Address given by J Neyman at the meeting of the Institute of Mathematical Statistics 
m Atlantic City, January, 1947. 

2 -whilG e IS assumed to be a real variable, -we admit J. to bo a finite dimensional vector, 
1 e., L - (£,i, . , where k, may bo any fimte positive integer. 

220 



PARAMETER ESTIMATION 


221 


We shall reqiiiie that the consistency relations among the density functions 
pi, Pi, ■■ ■ ,etc. be fuKilled, i e , 

/•+« 

(l-l) / Pn+i dXa+i — p„, (n = 1,2, • , ad inf.). 

'’“00 

It should be remarked that it is not postulated that p,, actually depends on all the 
parameters that appear as arguments in pn. It is merely assumed that p,, 
does not depend on any parameter that does not appear as an argument in , 
i.e., Pn does not depend on for any i > n. It follows, however, from (1.1) that 
if Pn depends on a parameter then also p,„ depends on f for any m > n 

Neyman’s definition of structural and incidental parameters can be extended 
to the case of dependent observations considered here by saying that the dis¬ 
tribution of Xi docs not depend on a parameter f if and only if the conditional 
distribution of X, for any given values of , Z,_i does not depend on f. 

It is not postulated that each of the parameters Ci, ^ 2 , • • ■ , etc. is incidental; 
some of them may bo structural. We shall not make an explicit distinction 
between structural and incidental parameters, since for the purposes of the 
present paper this does not seem to be necessary. 

In this paper wo shall deal with the problem of formulating conditions undei 
which a uniformly consistent estimate of 6 exists. A statistic tn{xi, ■ ■ , Xn) is 
said to be a uniformly consistent estimate of 6 if for any positive 5 

(1,2) hmprob, {|4 — ^| < 5) =1 

tteEOO 

uniformly in 0 and the ^’s, 

In section 2 a necessary and sufRcient condition is given for the existence of a 
uniformly consistent estimate of 6. In section 3 the amount of information 
supplied by the first n observations concerning 0 is defined. It is then shown 
that if the amount of infoimation is a bounded function of n over a non-degener¬ 
ate 0-intcrval, no uniformly consistent estimate of 6 exists. Section 4 gives a 
simplified formula for the amount of information in the case when the Z’s are 
independently distributed. 

2. A necessary and sufficient condition for the existence of a uniformly 
consistent estimate of 6. In deriving a necessary and sufficient condition for 
the existence of a uniformly consistent estimate of 6, use will be made of some 
results contained in a publication of the author [1] dealing with statistical decision 
functions which minimize the maximum risk. In [1] it is assumed that the 
domain of each of the unknown parameters is a closed and bounded sot and that 
Pn is continuous jointly in all of its arguments. Thus, in order to be able to use 
the results obtained in [1], we shall have to make the same assumptions here. 
In what follows we shall, therefore, assume that each of the parameters 6, , 

^i, • ■ , etc is restricted to 'a finite closed interval and that p„ is a continuous 
function of .ti , • • , a;„ , e, , ■ • • , . 

Let [a, &] (a < b) be the 0-interval to which the values of 6 are restricted. 
Clearly, if tn(xi , • • , Xn), (n = 1, 2, • ■ • , ad inf.), is a uniformly consistent 



222 


ABEAHAM WALD 


estimate of Q, then also t* is a uniformly consistent estimate of d when it = k 
when a ^ U ^b,C = B. when < a and <* = b when U > b. Thus, without 
loss of generality, we can restrict ourselves to estimates in which can take values 
only in the interval [a, b]. Uniform consistency of in is then equivalent with the 
condition 

(2.1) lim Bl(in - 0, , • • , ?nl = 0 

ItuOO 

uniformly in B and the ?’s. For any chance variable u the symbol 
E{u 1 b, , fa, • •) denotes the expected value of u when • are the 

true parameter values. 

In [1] a non-negative function 17(4, 6), called weight function, is introduced 
which expresses the loss suffered when 4 is the value of the estimate and 6 is the 
true value of the parameter. The risk is defined in [1] as the expected value of 
the loss, i.e., the risk is given by 

(2 2) rn(0, , • • ■ , ?n) = E[Witn , e) I 0, . fn]. 

If we put T'F(4, 6) = (4 “ bY, we have 

(2.3) r„(0, |i, - • • , f„) = ■E[(4 - 0)= I 0, , • • • , U 

It can easily be verified that Assumptions 1-4 in section 3 of [1] are fulfilled 
for the weight function 17(4,0) = (4 — 0)*.^ Thus, all results obtained in 
[1] can be applied to the risk function given in (2.3). According to Theorem 4.1 
in [1] the risk function given in (2.3) is a continuous function of 0, , ■ • • , 

for any arbitrary estimate 4 • We shall denote the maximum of (2.3) with re¬ 
spect to 0, , • • , by r„[4]. Thus rn[4] is a functional which associates a 

non-negative value with any estimate function 4. 

It follows from (2.1) that 4 is a uniformly consistent estimate of 0 if and only 
if 

(2.4) hm r„[4l = 0 

, rteSC 

For any 0 and for any n let F„(fi ,•••,£„ 1 0) be a cumulative distribution 
function of . Let, furthermore, 

Qnixi , • ■ • , Pn I 0, F„) 

(2.5) (•+” 

= Pnixi, ■ ,f„)dF„(^i, ■ ,|„|0). 

•^—00 *'—00 

We do not require that Fx, , ■ • ■ , etc. satisfy the consistency relations, i.e., 

lim F„+i(|i, • ■ , 1 0) is not necessarily equal to Fn(5i, ■ ■ ■ , In 1 0)- 

tn+1*"'^ 

^ In verifying Assumption 4, we may assume that pn is always > 0, since for any given 
values 9, ti, . , 4 we may restrict tta doraam of (a:,, . , x„) to the subset of the sample 

space where jJn > 0 



PARAMETER ESTIMATION' 


223 


Hence, also the distiibutions do not necessarily satisfy the consistency rela¬ 
tions. Clearly 

(2 6) r„[<„] ^ f ’ ' [ {tn — oTqni^x, ■ • , Xn\B, FJ dxi, ■ ■ ■ , dx^ 

for any & and any Fn . Hence, (2.4) and (2.6) imply that if is a uniformly 
consistent estimate of 9, then remains a uniformly consistent estimate of 9 
also 'when q„ is the distribution of Xi, • • , for any arbitrary choice of Fn . 
For each n let C'„(0, , ■ ■ , J„) be a joint cumulative distribution function of 

fi) ■ •' 1 fn. If this is regarded as an a priori distribution of 0, fi, • • • , fn , 
and if our aim is to choose so that 

Eiin - 9f 

(2.7) r+" r+" ^ 

= / • • (hi - 9) pnixi , • • , a:,, 1 0, , ■, in) dCn dxi ■ dXn 

is a minimum, then the best choice of tn is to put it equal to the a posteriori mean 
value of 6. Let tn(xi, ■ , Xn ; Cn) denote the a posteriori mean value of 6 
when Gn is the a priori distribution, i.e., 

I Bp„(xi,i , •'•'fl I fl I ■ ■ ' , fn) dCri 

(2.8) l!(.Tt, • • , a-„, On) = -- 

J PftO^l ) * * » I ’ ' * ) fn) dOn 

where the mtegration is to be taken over the whole domain of the parameters 
fi, ■ ■' I fn • Let, furthermore, f„[C'„] denote the value of (2.7) when L = 
in(xi, • ■ • , Xn ; C„). According to Theorem 4.4 in [1] there exists a particular 
distribution Cl , called a least favorable distribution, such that 

(2.9) fn[Gn] ^ fn[C"] 
for all C„ . Let 

(2.10) t\{xi, ■■■ ,Xn) = tnix-i, ■ X, Cl). 

It follows from Theorems (4.5) and (5.1) in [1] that for any estimate i„ we have 

(2.11) rniin] ^ rn[L] = fn(C'n). 

Hence, a necessary and sufficient condition for the existence of a uniformly 
consistent estimate of 9 is that 

(2.12) lim fjCi] = 0 

tlBOO 

Let i^n(fi, • • , fn I 9) denote the conditional cumulative distribution of 
fi) • ■ • , fn for given 6 that results from the joint distribution C„(0,'fi, • • ■ fn) 
and let (fi, * ■ • , fn | ^) correspond to Cn(0, fi, ■ • ■ , fn). Clearly, any uni¬ 
formly consistent estimate of 9 with respect to p„(a:i, • • ■ , a:n | 0, fi, • ■ • , fn) 



224 


ABRAHAM WALD 


is a uniformly consistent estimate also with respect to Xn \ 6, Fn) 

for any Fn . On the other hand, if Qni^i, , Xn\ 0, Fn) admits a uniformly 

consistent estimate of d, equation (2.12) must hold and, therefore, p„(xi , • ■ • , 
ain I Cl) • ■ • 1 ?n) admits a unifoimly consistent estimate of d. Hence we 
arrive at the following theorem: 

Theorem 2.1. A necessary and sufficient condition that 


admit a uniformly consistent estimate of 0 is that q„ixi, • ■ ■ , x,,. \ 6, FJ admit a 
uniformly consistent estimate of d for any arbitrary choice of Fn . 

3. Amount of information contained in the first n observations concerning 
the parameter d. We shall make the followmg assumptions: 

Assumption 1. The first two derivatives of pn{xi, ■ • ■ , | Ci, • ■ ■ , C„) 

with respect to Q exist. 

Assumption 2. We have 


(3.1) 

and 



dxi • • ■ dxn < °o 


( 32 ) 



dXi • • • dXn < 


CO 


for any n. 

Assumption S. The integral 


£■■ r 


log gnjxi, ■ • , Xn I d, Fn) 

de^ 


,x„ I 9, Fn) dXi • dXn 


exists for any 0, Fn and n where g™ is defined by (2 5). 
Since 


3^g qn _ I aV- _ /d log g„y 
502 qn 502 \ 00 / 

and since, because of Assumptions 1 and 2, 

r °° r “ ;i2„ 

■■■■ ••da;n = 0, 

we have 


c-r 


0^ log qn 
002 


(]ji ' ■ • dxn 




PARAMETEE ESTIMATION 


225 


Let 


(3 4) cM 


CC 


S' log (?«' 
dB'^ 


(Jn dXi ■ ■ dXn 


Clearly c„(0) S 0. We shall nmv show that 


(3.5) 

Cn+i(0) ^ c„(fl) for 71 = 1,2, • ■ 

, ad inf. 

In fact, we can write 



-3^ log ( 7 „+i(a;i, 

• ■ • , j:„+i I 0, Fn+i) _ log q»{xi, | 

0, Ft) 

(3.6) 

302 gg'i 

aMogA+i(3;„4i Ixi, ■ • ,x„ 

) 0) F„.|,i) 


302 


where Ft = ''diin Fn^i 

(^1 ) ' j ^n+1 1 nnd /n-|-l(^n+t 1 j ' ' } tJCji f 

0) Fn+i) 




is the conditional probability density function of Xn+i given the values of Si, 
■ ■ • ,Xn and assuming t\at the joint density function of Xi, ■ • • , Z„+i is given 
by qn h(ti , • ■' , Xn+i \ 6, K+i) Since c„(6) ^ expected value of 

log (]„{Xi , -■ ,Xn\e, Ft) 


and since the expected value of - is ^ 0, mequality (3.5) must hold. 

In analogy with R A. Fisher’s information function, we shall call Cn (9) the 
amount of information contained in the first n observations regarding 6 We 
shall now prove the following theorem- 

Theorem 3.1. If lini Cn{0) ^ c < ■» over a fmiie non-degenerale 9-interval I, 

n«n>Qo 

the^i there is no unifoimly cons%slent estimate of 9. 

Proof. If for any n, Cn(fl) ^ c < «> over the interval I, for each n there exists 
a distribution Fn(h , • , 1 6) such that 

0 < _ [” . S‘log(j„(xi, ■ ,x„l9,Fj 

(3.7) ^ ■>-« •'-« 

■ qn{xi, • • ■ , .r,, i 0, Ff) dxi ■ ■ ■ dx„ ^ c + 1 

> 

for all n and for all 5 in 7. Let be any estimate and let 


bn{9) = E(t„ - 6) = [ ■ ■ f (l„ - 9)q„(xi , • ,x„\e, F„) dxi ■ ■ ■ dx„ 

(3 8) 

r" r" 

= 10, FJ dxi ■ ' dxn - 6. 

Since in is bounded, it follows from Assumptions 1 and 2 that exists and is 

du 



226 


ABEAHAM WALD 


a coEtinuous function of d. According to a theorem by Cramer [2] \vc have 

(l + ~ 

(3 9 ) E{i„ ~ oY = j ■••/(<- ^Yqn du ■ ■ ■ dx„ 

for all e m 7. Thus, m order that lim Eitn - of = 0 uniformly in 8, we must 
have 


(3 10) 


,, dbnid) 

hm——- 


= - 1 


uniformly in 8 over 7. Let 7 be the interval ranging from g to h(g < h). From 
(3.10) it follows that 

(3 11) hm [hnih) - h„(j/)l = g - h. 

Hence 

lim inf max [bn(0)l^ A . 

' na4o fl in / 4 

Since E{tn — 9f A [6„(d)]^ — Bf cannot converge to zero uniformly in 6 

and Theorem 3.1 is proved. 

4. Formula for Cn(0) when p„(a,i, • • • , x„ | ^, fi, • • • , f„) is equal to v^(®i 

I e, ^i) <P 2 (% I Jn) • ■ • <Pn(xn 1 8, I,)- ^6* gt(xi | .Hi, • • • , a!<-i, 9, Fn) bo the 

conditional probability density of X, given xi, , x,-i when the joint density 
function of si, • ■ , !c„ is given by gn(a:i, • • • , x» | 9, FJ, (i g n). Clearly, 



Now 


(4.2) g^(x, I Ki, 


, x^i ,8,I\) — 


1 8, ^i) \xi, ,Xi-i,d,F^ 


where 77i(L | a;i, ■ ■ , a:,_i, 8, F„) denotes the conditional cumulative distribu¬ 
tion of given Xi, • ■ ■ , a;,_i, assummg that F„(|i, ■ ■ , | 0) is the joint cumu¬ 
lative distribution of ,•••,?„ and p„(a;i, • • • , | 0, fi, • • • , Jn) is the joint 

density of Xi, ■ ■ , for any given values of 0, , • • ■ 

It follows from (4.2) that 



log gi 
002 


g.dxi A cni{9) 


= g.lb. 
o,(£,) 



sMogjf^ ^.(^il0,|,)d(7KfO 


002 




parameteb estimation 


227 


where C.Cfi) may be any cumulative distribution of . Hence 

(4.3) = 
and, therefore, 

(4.4) cM = E c„,(0). 

The quantity c„,(0) is simply the amount of information contained in the ^th 
observation alone. Thus, formula (4.4) says that li Xi, ■ ■ ■ , Z„ are inde¬ 
pendent, the total information contained in the first n observations is equal to 
the sum of the amounts of infoimation contained in each of these observations 
singly. 

REFERENCES 

[1] A Wai.i), “Stalistical decision functions which minimize the maximum risk", Annals 

of Math , Vol. 46 (1946) 

[2] II. CHAMin, Mathenialtcal Methods of htiatisiics, Princeton Univ Press, 1946. 



INVERSION FORMULAE FOR THE DISTRIBUTION OF RATIOS 

By John Gurland 
University of Cahfornia, Berkeley 

1. Summary. The use of fhe repeated Cauchy principal value affords greater 
facility in the application of inversion formulae involving characteristic func¬ 
tions. Formula (2) below is especially useful in obtaining the inversion formula 
( 1 ) for the distribution of the ratio of linear combinations of random variables 
which may be correlated. Formulae (1), (10), (12) generalize the special cases 
considered by Cramer [ 2 ], Curtiss [4], Geary [ 6 ], and are free of some restrictions 
they impose. The results are further generalized in section 6 , where inversion 
formulae are given for the joint distribution of several ratios. In section 7, the 
joint distribution of several ratios of quadratic forms in random variables 
Xi, Xi, ■ ■ • , Xn having a multivariate normal distribution is considered 


2. Introduction. We sliall write 


g{k>U, ■ ,Q dkdk dt^ 

== // ■ •' / ^‘ ,U)dti dt. 


..-.0 


dt„, 


which might be called the repeated Cauchy principal value of 
j j * ' ' J j k 1 ' * * j tn) dk * * din , 


and which we shall use frequently. The results of this article may be regarded 
as extensions of the following theorem proved in section 4. 

Theorem 1. Let Xi, Xi ,■■■, Xn have the joint distnbuiion function 
F{xi, X 2 , ■ ■ ■ , x„) with corresponding characteristic function g>(ti, k , ■ • • , Q. 
Let G(x} be the distribution function of {aiXi + • ■ + o„Zn)/(&iZi + ■ • • + fenX«), 
where 01 , 02 , ■ , o^ , &i, 62 , ■ • ,bnare real numbers. If 

h, X, < o| = 0 , 

then 



( 1 ) 


0{x) +Q{x-0) = 1 


1 i ,t(an-b,.g;)l 

iri J t 


3. An inversion formula for distribution functions. Let F(x) be a distribu¬ 
tion function and <^(t) be the corresponding characteristic function. Then the 
foUowmg inversion formula holds: 


228 



INVERSION FORMULAS 


229 


( 2 ) 


^(?) + i^’(f - 0 ) = 1 - - rf 

irl ■■ 


Proof. 


(fj+0 1 - (IS+0 •“ C 

-CMIS^I 


^ :_n dt 0 




by the Fubini theorem on the iiivereion of integrals. But 


- (f 

TTl J 




sin at 


dt is 


r ^ ' 

where sgn y = —1, 0, 1 according as y < 0, y = 0, y > 0. Since | - 

J— r 

uniformly bounded in T, the principle of bounded convergence for Lebesgue 
integrals implies that 

lim f dF(x) ( f + [ ) v ~ f ~ 

7r2 <“*0 *^to \*'—7 " < J t ““eo 

= (/ + / + / ) sgn (a; - f) dF(x) 

\J_„ J({) Ji^/ 

= -F(£ - 0 ) + 1 - F(f). 


The required result follows at once. 

Another form of ( 2 ) may be obtained as follows: Let FI(x), K{x) be distribu¬ 
tion functions, and \p{t), x(0 the corresponding characteristic functions. Setting 
F = , 0 = i/-, ^ = 0 in ( 2 ) yields 

i?(o) - 0 ) = 1 - i / m f, 

TTl J t 

while setting F = K,d> = x, ( 2 ) yields 

m) + K(^ -0) = 

■m J t 


Clearly 

(3) KiO + K{^ - 0) = HiO) + /f(0 - 0) + A (f dt. 

TT'i J t 

If = K, then f = %, and (3) reduces to a well-known inversion formula (cf 
Kendall [7, p. 91]). 


4. Distribution of the ratio (aiXi-p- a„ Z ,,)/(61 Xi-k +bnXn) 
with denominator positive. Theorem 1 Let Xi, X 2 , • , X„ have the joint 



230 


JOHN GURIi-VND 


dislribution funcimi F(xi, Xj, • • • , a;*) with comsponding characteristic function 
<j>(ti , i 2 , • • • , tn). Let G{x) he the distribution function of {aiXi + • • ■ + a^Xf)/ 
{hiXi + • • + hnXf) where Oi, 02 , • • • , On, 6i, & 2 , • ■ ,hn are real numbers. If 
P\I,ih,X, < 0} = 0,t/ien 

G{x) +G(x-0) = 1 - -(f 'JAxAAdt. 

n J I 

g 

Proof. Note that 

< a:| = P{S(a. - b,x)X. < 0), 

and let = P(S(o, — bix)X, < and Xi(i) be the corresponding char¬ 
acteristic function. Clearly P,(0) = G(x) and 


Xi(0 "" &i^)) * * * 1 ^(^71 


On applymg (2) to fii(J) and setting ? = 0, the required result follows at once. 

If (3) is applied in place of (2), with jC = (?, we obtain 

G{x) + Gix - 0) = im -t- H(Q - 0) 

-f 1 / ~ • • • ) - fenic) 

rij I 

We shall consider (3) and (4) when n = 2 and 

Ai bi\ /1 0\ 

\ai hi) \0 1/ 

Two cases will be treated separately; first, when Xi, X 2 are independent, second, 
when Xi, X 2 may be correlated. 

If , X 2 are independent, and F{xi, «>) = Fi{xi), P(oo, xf) = PaCxj), with 
corresponding characteristic functions ^ 2(0 then (1) becomes 

(5) G{x) + G{x - 0) = 1 - i., (f dt, 

m J I 

while (4) becomes, taking H ~ F 

(6) Gix) + G{x - 0 ) = -. (f dt 

-n J I 

Cram4r [2, p. 46] proves, for J;, X^ independent and ^ 2 ( 0 ) = 0, that 

Q(x) = -L f* ~ 

2Tri t 

under the following conditions: 


(i) Xi and Xj have finite means; 



dt < 00 . 



INVERSION FORMULAS 


231 


If A'l, Xa may be correlated, then (1) becomes 

(7) G{x) + G(a: - 0) = ] - - (f at- 

mj t 

while (4) becomes, taking H = F, 

(8) Gix) +G{x -0) = I cf dt. 

m J t 

Professor P. L. PIsu, in a course of lectures attended by the author at the 
Statistical Laboratory, University of California, gave the following result of 
Cramdr, which was stated thus, using the above notation: 


(9) 

provided] J 




' Mi) - Xxit) 


dt, 


\Mi) — Xrit) 


dt < CO ^ where i^aCO) = 0 


and Xi(0 is defined above expression (4). 

The following corollary is obtained from (1) according to well-known theorems 
concerning differentiation under the integral sign: 

Corollary. Suppose , k, • •• , <n) is the characteristic function corre¬ 
sponding to Xi, Xi, • • • , X„ , and G{x) is the distribution function of 


(axXi + • • • + a„X„)/{hXi + • • • + bnX„); 
then,if P{2biXi < 0) = 0, ' 


( 10 ) 


G'(x) = ^,(f\t,bu 1 dt, 


in every interval in which the integral converges uniformly. 
If n = 2, and 


hi bi' 


1 0 
0 1 


then 

( 11 ) 




d<h ik , Q 




dti. 


Cram6r [3, p. 317, exercise 6] states the following result: 


If F{xi,Xi) = j J f(u,v)dudv, and Fj(0) =0, then 


if the integral is uniformly convergent with respect to x. 



232 


JOHN GURLAND 


l»Xl pX2 

Geary [6] has shown that if F{x \, a:*) = / / /(«, v) du du, Fz(0) = 0, and 

CO 

-00 

~ J w) du, then 




dli f 


provided 

(i) 

(ii) 


4>{ii 4) = 0 for ti = ± CO, 

( dy f yXit, y)e~'‘’^ dt = f dt [ j/X(<, y)e“’“'* dy. 

‘'0 J— M to ^0 


Formula (1) can be employed in the case n = 2, Xi, Xj are independent, and 


/ai bi') 

bij 


n o\ 
xo ij 


to obtain closed expressions for the distribution functions of ratios in which the 
variable in the numerator and that in the denominator may have any one of 
the followmg four distributions: Binomial, Rectangular, x°) Normal. In the 
case of the four ratios with the binomially distributed variable as the denomi¬ 
nator, a translation must be made to ensure positiveness of the denominator. 
For the four latios with the normally distributed variable as denominator, the 
distribution function obtained is approximate; and the approximation is good 
if PjZj < 0) is sufficiently small (cf. Geary [5]) 


6. Distribution of the ratio (aiXi-f ■■• + a„X„)/ibiXi + •■■ + h,,X„), with 
denominator positive or negative. The following theorem will be proven: 

Theohem 2. Lei G{x) be the distribution funchon of (aiXi + ■ • ■ + OnX„)/ 
(biXi b„X„) where ai, oz, • • ■ , a„ , ti, 62, , b„ are real numbers. 

If PjSj* 61X1 ~ Oj = 0 , then 

G{x) + O{x-0) = 1-1 i 

( 12 ) ^ 

<t>^[l(ai-hix), ■ ■ ,tia„ — b„x)}+<j> {i(ai — bix), ■ ■ ■ ,t(a,i — bnX)] „ 

- at, 

where 


, <2, ■ ■ 

- dFix ,, aiz, • • • , .r„), 


2i)**A,>0 

0 (ii , iz , ■ ■ 






inversion formulas 


233 


Proof. = PlSb^Z* > 0) -PlSZiCot - hx) < ? | XhXk > 0} 

+ P{26tXi < 0) P{SZfc(afc — hx > — ^ \ hbkXk < 0}. 

Then Pi(i») = 1, Px(— oo) = 0, and is non-decreasing in C and continuous 
on the right. Hence Px(J) is a distribution function (Cramer [2, p. 11]) It can 
be shown by a proof analogous to that used by Curtiss [4] that the characteristic 
function of Px(i’) is 

— hix), ■ ■ ■ , 4(an — b„x)} -f- <tr{L{hix — ai), ■ , t{hx — a„)l 
Since Px(0) = G(x), application of (2) to Pj© yields the required result. 


6. Inversion formulae for multidimensional distribution functions. The 
n-dimensional analogue of (2) will now be given, and will be applied to obtain 
inversion formulae for the joint distribution of several ratios 

Let Zi, Z 2 , ■ • , Xn have the joint distribution function F{xi, , • ■ ■ , Xn) 

and the corresponding characteristic function <i>(ii ,k , ■ ■ ■ , in). Let 

) 32 > * * » 3fc (h 2 k i ‘' t k) 

be the characteristic function corresponding to the marginal joint distribution 
function of , X,^, • • • , , where the set Ji, ji, ■ , jk is a permutation 

of k of the integers 1,2, • • ,n. Note that 


, <2 I ' 2 4) ~ 2 , , n{tl > it , • • • , tn)- 

The summation 2 Piiin > ^ 2*1 > > ?THn)> which will appear below is 

<>1, '2, ■ . •«) 

to be interpreted as follows: 


Defining = S-j if i, = I 2 
= 5, — 0 if = O 2 

then Fihn , ' ‘ ‘ j fntj Will mean that the summation is to be taken 

(tl, *2, * , »n) 

over all binary numbers .iik • ■ • in. 

Using the notation of the preceding paragraph, we can state the followmg 
theorem. 

Theorem 3. Let Ao , Ai , ■ • ■ , An satisfy the n + 1 equations 


= 1. An=~l, (r = 0,l,2, •■•,71- 1), 


where 

Then 

( 13 ) 


as usual, denotes the binomial coefficient. 


(:) 

(_l)nH.l £ ^ £ {{■■■{ 

■.*«) ■ M n<ii<. ••<ik J J J 


-exp( —+ ' ■ + ,k , ■ ■ ,tk) 


dil dti * • • dtf( 
ilii • • • tk 



234 


JOHN QURLAND 


Proo^ I Since the theorem is already proved for a = 1 (section 3), and since 





0(h , <n) 


dti dU • ■ ■ dtn 
kU ■ ■ • in 


a ec A tC 

sgn (rcj - fi) sgn (j ;2 - ii) 

aO *^00 


• ■ sgn (a;„ - {„) dF(xi , s,), 

the theorem could be proven by induction. The result is obtained more quickly, 
however, by noting that it suffices to consider the case of independent 

.^1 I ^2 I ''' J • 

It may be remarked that if Qi, &,•••, f„) is a continuity point of 
P(xi, xi, ■ • ■ , Xn), the left-hand member of (13) becomes 

(_l)-‘+VF(b,?2,-- - 

and also that differentiation of (13) yields 


(14) 


d"F((l , ^2, • ■ ■ , in) 
Shdi2---d^„ 




•+<«£«) 


(fi(k, k , 


* I in) dtl dk ' ' ' dirt I 


in every n-dimensional interval in which the integral converges uniformly. 
This agrees with well-known results concerning Fourier inversion formulae. 

An inversion formula for the joint distribution of p ratios 


nitAl ~t~ tt iiAa ttniXn 

buXi + bi,X2 + ... -b J}„,X^’ 


^ - I> 2, ■ ■ ,p (1 < p < n), 


can be obtained from (13) by a method similar to that applied in section 4. The 
following theorem holds: 

Theorem 4, Let 


and <l>{k, k, , in) be the characteristic function corresponding to Xi , 
Xi, • • , Z„. Then, if P{2b,^i <0} = 0 (fc = 1, 2, • ■ • p), 


(<1*2 '•ip) ' 


(•16) 


• + tiW “K - H 


1 



INVERSION EOHMULAS 


235 


The following corollary is a generalization of (10) and follows by differentia¬ 
tion of (15)' 

Corollary. Swppose G(xx, X 2 , • ■ , x^) is the joint distribution function of 
the p ratios 


(ii, Xi f- -b a„i Xn 
b\, Xi + ■ • ■ f- bnt Xn ’ 

and , fe . ■ • I in) M thecharacierisLtcfunciioncorrespondingtoXi,Xi, ■ • • , X„, 
then, if h^,X^ <0} = 0, j = L, 2, ■ • • p, 


( 16 ) 


J.yii I 

I ... c)^n \2Tn/ J J J 


bubk 


■ b 


kp 


a” <l>ik , Ll;, 
atf 






in every p-dimenswnal interval in which the integral converges uniformly 


7, Joint distribution of ratios of quadratic forms. Lei X\ , Xi 
have the joint probability density function 

(U) = 

(27r)»'2 


, Xn 


where a: = (xi j .la, • ■ , a;^) and B is a positive definite symmetric matrix. Sup¬ 
pose Q is a positive semi-delinite symmctiic matrix of rank r < n and 
is a set of symmetric matrices. We wish to obtain the joint 
distribution function G(fi, h, • , Q of the p ratios 


XUX' XLiX' XL,X' 
XQX' ’ XQX^ ’ ■ ■ ’ XQX' ’ 


where X = (Xi , Xi , ■ , Xn) 

The existence of such an oithogonal matrix S that SQS' = l'’'', where J''* is 
the diagonal matrix having the fust r diagonal elements equal to unity and the 
rest equal to zero, is well-known. Let X = YS, C = SB S', Mt = SLiS', and 
no1;e that C and the M, are symmetric matrices Also 


(YMt Y' 

Gi^i , , • • • ,^p} = P \yjw) y’ ~ 


YAI„Y' 
' YI'-''^ Y' 



where F = (Fi, Fa, • ■ • , F„) has the probability density function 


giv) = 


(det C)^ -\vvk' 


and j/ = {yi,Vi, ■■■ , Vn)- 

Suppose the Li mutually commute in pairs Then so do the M, ; for MM, = 
SLiS'SLiS' = SL,LiS' = SL,LiS' = SL,S'SL,S' = , since S is orthog- 



236 


JOHN GUR1.AND 


onal. Hence, there is an orthogonal matrix U whicli simultaneously reduces 
each M to diagonarform; that is, N = UMU' is a diagonal matrix (cf. Weyl 
[8, p. 25]). 

Let Y = ZU, D - UCV, so that 


I ^2, 


ZN,Z' = E 

7-1 

t ) = p i ^ t . 


V 0^^ 
^IXjr, /j j 


(r) y2 ^ f j 

I ^ i 


where = 1 if j < r; 


= 0 if i > r, 


and Z = {Zi, Zi, ■ , Z„) has the probability density function 

If A — (det P) 

where 2 = (?i, Za, • • ■ , 2 „). 

We can now apply the results of section 6, If >/'(h, is the character¬ 

istic function corresponding to the joint distribution function oiZ\,Z\, - ■ • ,Zl 
it is clear that 


If, . . K r detD T 

i/'( 1,^2, ••;,/«) [det(D - 2 zT)J ’ 

where T is the diagonal matrix whoso diagonal elements are h , fc , ■ • • , Ap¬ 
plying (15), with <t) = ii, we obtain, since G is obviously a continuous distribution 
function 


(17) 


(-1)^+12’’ ,Q 

- + li R- ••■/#■( g «U, - 4" t„), 



(r> A dwi ■■■ dWk 

tj,/ f - 

' j WilOi ... Wk 


-1. + LA 2 <[{■■■<( 

1-1 (iri)* J J J 


dot D 


det j da 0 — 2i5„p 2 ~ f/j) 


^ t ^niidwa < ■ • dw h 

lUl Wi Wk 


where H = [d„p] and Sa^ is the Kronecker delta. 

It is, of course, evident that a result analogous to (17) could be obtained, by 
considering p ratios 



INVERSION FORMULAS 


237 


XUX' XUX' XL„X' 

XQiZ'’ XQiX'’ ’ XQpX'’ 

■where the 2 p matrices Li, , • ■ ■ , Lj,, Qi, ■ , Qp are symmetric and 

mutually commute in pairs, and Qj, Qi, • • , Qp are positive semi-definite. 

In the case p = 1 in (17) and for special classes of matrices Li , Qi, B the cal¬ 
culus of residues may be employed to obtain closed expressions for the distribu¬ 
tion of 


XLiX' 

XQ,X' ■ 

Formula (17) can be applied to obtain the joint distribution of serial correla¬ 
tion coefficients 'With different lags. The author plans to incorporate these 
results with those mentioned at the end of section 4 in a forthcoming paper, 
written jointly with Roy B. Leipnik. 

The author wishes to acluiowledge the valuable criticism of Professor H. Lewy, 
and Especially the constructive advice and suggestions of Roy B Leipnik. 

REFERENCES 

[1] M. B6citEK, InlroducUon lo Higher Algebra, Macmillan Co , Now York, 1907. 

[2] H. CliAM^iR, “Random variables and probability distributions”, Cambridge Tracts m 

Mathematics, No, 36, Cambridge, 1937. 

[3] H CiiamISr, Mathematical Methods of Slatistics, Prinooton University Press, 1046. 

[4] J, H. Curtiss, “Distiibution of a quotient”. Annals of Math Slat , Vol 12 (1941). 

[5] R. C. Gearv, “The frequency distribution of the quotient of two noimal variates", 

Roy. Slat. Soo. Jour , Vol. 103 (1930) 

[6] R. C Geary, “Extension of a theorem by Harald Cramdr”, Roy, Slat Soc. Joui., Vol. 

107 (1944). 

[7] M G Kendall, The Advanced Theory of Statistics, Giiffiii Co , London. 1943. 

[8] 11 Weyl, The Theoiy of Gioups and Quantum Mechanics, Methuen and Co , Ltd , 

London, 1931 



THE FACTORIAL APPROACH TO THE WEIGHING PROBLEM' 


By 0, Kempthobne 
Iowa Slate College 

1. Summary. The weighing problem is di.s(!ussecl from the point of view of 
factorial experimentation. The paper contains a brief description of the frac¬ 
tional replication of the 2" factorial system. It is shown that optimum designs 
for the weighing problem may easily be obtained with this approach. This 
approach is valuable in indicating the structure of weighing problem designs, and 
the limited conditions under which such designs can give results of value. 

2. Introduction. Considerable attention lias been given recently to the 
problem of weighmg a number of light objects on a scab [1,2,3,6] The problem 
was originally proposed by Yates in his paper on complex experiments [4] as an 
example of a factorial experiment in which interactions between the factors tested 
would not be expected to exist, that is, the weight of say two objects could be 
assumed to be the sum of the weights of the objects weighed separately, after 
taking account of any necessary zero corrections Such a situation is compara¬ 
tively rare m biological research when, for example, the effect on yield of a parti¬ 
cular crop from the joint application of two nutrients is usually different from the 
sum of the effects of separate applications. In recent years attention lias been 
given to the use of fractional replication in factorial experiments [7, 8, 9] and it 
is proposed in this paper to consider the weighing problem from this point of 
view. 


3. The 2" factorial system. A full description of the 2" factorial system was 
given by Yates in his technical communication The Design and Analysis of 
Factorial Experiments [5]. Yates was particularly concerned with the analysis 
of such experiments and with the evolution of systems of confounding in order 
to reduce the number of plots in each block. The folloiving brief account is 
given in order to facilitate the discussion of the weighing problem. 

In a single replication of the 2” system all combinations of n factors each at 
two levels are tested. With three factors, a, b, c, for example, the following 
eight combmations are tested; (1) a, b, ab, c, ac, he, and abo, where (1) denotes the 
control, a the application of treatment a only, ab the application of treatments 
0 and b, and so on. A set of seven independent comparisons between the eight 
test results is given formally by the expansion of the formula ' 

K® ± l)(i» ± l)(c ± I), 


' Journal Paper no J 1648 of the Iowa Agricultural Experiment Station, Amea, Iowa. 
Project No 890 



THE WEIGHING PROBLEM 


239 


where at least one of the signs is negative. If, for instance the first sign only is 
taken to be negative, a foimal expansion give.s the expression 

J{a6c — he— b-\~ac — c-\-a — 1), 

and this contrast of the observations gives the effect of the factor a averaged 
over the presence and absence of the factors b and c, which is denoted by effect A 
Similai’ly taking the negative sign in the second bracket only, we get the average 
effect B, 

B = -^{ahe — oc + ofi — a+bc—c + ii — 1}. 

Taking negative signs in the first and second brackets we obtain the interaction 
AB ■ 


AB = -llabc + c -b ab + (1) — oc — be — a — b}, and so on. 

The definition of effects and interactions may be presented very simply in 
geometrical termmology, by representing the treatment combinations as points 
of an n-dimensional lattice, each axis of the lattice having two points at unit 
distance apart. The control treatment will have coordinates (0,0,0 , 

0), the treatment consisting of a only will have co-ordinates (1, 0, 0, • • ,0) 
and so on. The effect A is then the difference of the mean yield of the treatments 
corresponding to the points lying on the hyperplane 

xi = 0 , 

and the mean yield of those represented by points lying on the hyperplane 

Xi = 1 . 


The interaction of two factors a and b, represented by the axes xi and xi respec¬ 
tively, will be obtained from the difference of the mean yields of those plots for 
which 

ail -f Jij = 0, or xi + X 2 ^ 2, 

and those for which 


Xi + Xi = 1 . 


The extensions to the above for three-factor and higher order interactions are 
simple. The interaction of factors a, h, and c, which are represented by coordin¬ 
ate axes xi, Xi, and ais, is given by the difference between the mean of plots 
represented by points for which 

Xi "b 312 "b iKa = 0 or 2, 


and those represented by points for which 

xi + Xi + Xi = 1 or 3; 

in other words, it is the difference of the mean yields of those plots for which 



240 


0. KEMPTHORNE 


xi 4 - .T 2 + 0:3 = 0 (mod 2 ) 

and of those for which 

xi + •'C 2 + a '3 = 1 (mod 2 ). 

Each effect or interaction is then defined as the mean difference of two sets of 
plots, each set being represented by points on parallel hyperplanes, and the planes 
of one set of parallel hyperplanes lying between the planes of the other set. It is 
necessary to specify only the direction cosines of the hyperplanes in order to 
specify the effect oi- interaction, and the usual terminology for effects and interac¬ 
tions follows, in that the interaction of factors a, b, c, for example may be repre¬ 
sented by the symbol ABC. 

In the same way as effects and interactions are defined in terms of the yields 
of the several treatment combinations, the expected yield from each treatment 
combination may be expressed in terms of the mean level of yield aiid the true 
effects and interactions If the full set of combinations of the factorial scheme 
is tested, the best estimate of each true effect and interaction is the same func¬ 
tion of the observed yields that the true effect or interaction is of the true yields. 
This fact is one of the advantages which follow from the use of the full factorial 
scheme 

We are not concerned here with factorial experiments in which the factors have 
more than two levels, but when the number of levels of each factor is the same 
prune number, effects and interactions may be represented by products of powers 
of the symbols for the factors. In the case of two factors (o, b) at three levels, 
for example, the mam effects may be represented by A, B, and the interactions 
by AB and AB'^, each symbol referring to the two independent contrasts between 
three sets, each of three plots 

As an example of the use of the above representation, wc may consider con¬ 
founding, that IS, the arrangement of the treatment combinations in blocks in 
order to reduce the experimental error. Suitable arrangements are such that 
contrasts between the blocks represent high order interactions which the experi¬ 
menter IS confident will be of negligible size. 

If treatment combinations for which 

m ail + 02 0:2 -f • • ■ + an x,, = 0 (mod 2) 

and for which 

PiXi + fiiXi ■ -f Xn. = 0 (rand 2) 

, are arranged in a particular block, then the coordinates of the treatment combina¬ 
tions in this block also lie on the hyperplane 

(m + /3i)a;i (02 + -j- • • -f (a„ + = 0 (mod 2), 

where the coefficients (o] -j- (3i) must be reduced modulo two. If, therefore, the 
treatments are arranged m blocks so that two comparisons are block contrasts, 
then the generalised interaction of these contrasts is also a block contrast 



THE WEIGHING PKOBLEM 


241 


4. Fractional replication. The principle of fractional leplication follows 
very simply. Suppose only those treatment combinations whose yields all occur 
either in the positive or the negative part of a particulai contrast are represented 
in the experiment, that is only those combinations represented by the points of 
the lattice for which say 

ai ai] + as aij + • • • + *„ = 0 (mod 2). 

Then the comparison between the yields of those plots represented by 

/3i + /3j -Ta 4- • • • + /9„ = 0 (mod 2) 

and by 

di aij + /32 a:a + • ■ • + = 1 (mod 2) 

will be identical with the comparison between the yields of plots represented 

by 

(oil + ^i)xi + (aa + ^i)xi + • • • + (ttn + Pn)xn = 0 (mod 2) 

and by 

(ffi + ^i)xi + 4“ ^ 2)^2 4- • • • 4* i<Xn 4" Pn)Xn = 1 (mod 2). 

The former of these two comparisons may be represented by the symbol 
xi'xl^ • • • and the latter by where x, , • • • , x„ 

are no longer coordinates but symbols for the n factors, which satisfy the relations, 
xf = 1, if a = 0 (mod 2). The equivalence of the two comparisons maybe 
obtained by the use of an identity relationship in the symbols for the factors 

I = xVxP • • • x“’‘ 

where I is interpreted as unity, and only those combinations whose coordinates 
(xj, X 2 , ■ • • , x„) satisfy one of the equations 

ai Xi 4" «2 X 2 + ‘ ■ 4" <XnX» = or = 1 (mod 2), 

are represented in the experiment. If this identity relationship is multiplied 
by the symbol xi'x?'' • • • x®" by ordinary commutative algebra, reducing the 
powers modulo 2 where necessary, we obtain 

x?'x?“ • ■ • x®» = . 

It is more convenient to revert to the common use of capital letters A,B,Cf etc. 
for effects corresponding to small letters a, b, c, etc. for the factors tested An 
experiment in half-replicate is then represented forma.ly by an equation of the 
type 

I = • ■ • 

In such an experiment on n factors only 2”~^ treatment combinations will be 
tested. Of the 2" — 1 independent comparisons in a fuUy replicated experi- 



242 


O. KBMPTHORNE 


ment, information on ono comparison is lost completely since only those treat¬ 
ments which appear in the comparison with the same sign are represented: 
the remaining 2” - 2 independent comparisons of a fully replicated experiment 
are identical in pans giving 2”“* — 1 independent comparisons. Each com¬ 
parison is then said to have two aliases and measures the sum (or difference, 
depending on which half of the treatment combinations are used) of two effects, 
an effect and an interaction, or two interactions. 

A quai'ter-replicatcd experiment can by the same process be represented by an 
identity relationship of the form • 

j -= ... = • ■ = A'** i-ra) ... 

It IS useful in the evolution of fractional designs to note that the elements in the 
identity relationship form an Abelian group. 

Fractionally replicated experiments are formally identical with confounded 
experiments in that block differences may be regarded a.s additional factors in the 
confounded experiment. A 2" experiment arranged in 2” blocks, for example, 
may be regarded as a 1 in 2'' design of a 2"'*'^ experiment. Considerable care 
needs to be exercised in the use of fractionally replicated designs, but they have 
been found to be very useful in agricultural and biological research. 

5. The weighing problem. The problem, of weighing a number of objects 
may be regarded as the problem of the estimation of tlio effects of a number of 
factors which do not interact. To take a simple case, consider the estimation of 
the effects of factors a, h, and c for which one complote replicate would consist 
of the combinations 

(1) a, b, ab, c, ac, bo, and dbc. 

Suppose a half replicate design is used, based on the identity relationship 

1 = ABC. 

The combinations tested would then consist of either the set (o, b, c, abc] or the 
set {(1), ab, ac, be], ‘ If the former set were chosen, the comparison estimating 
the effect A could also be ascribed to the interaction BC, that estimating effect 
B also to the interaction AB, and that estimating effect C to the interaction AC, 
as can be observed by multiplying the identity relationship by A, B, and C m 
turn. If the experimenter is confident that the two-factor interactions are 
negligible, then any effect given by each compaiison would be ascribed to the 
mam effect. 

6. Discussion of a particular case. We give the derivation of a design for 
WMghing a particular number of objects, say ten Let the objects be denoted 
by d, b, c, 'd, e, f, g, h, h, 1. Then the total number of combinations which could 
be tested is 2*, that is 1021, but as we are confident that interactions arc neghgi- 
ble, it is necessary only to estimate mam effects. 

A fractionally replicated design must consist of a number 2’’ of combinations 
and this will be a 1 in 2'° ’’ design. A suitable fraotionallj' replicated design 
consisting of 16 combinations will exist if it is possible to evolve an identity 



THE WEIGHING PBOBLEM 


243 


relationship for a 1 in 64 design, such that each term in the relationship involves 
at least three letters A possible identity relationship for such a design contains 
the numbers of the Abelian group obtained from all combinations of the elements 
1, ABC, CDE, EFG, GHK, ADL, and AFH, with the rule that the square of each 
letter is to be equated to unity. Each possible comparison may then be due to 
any of the 04 effects or interactions which may be derived from this identity 
relationsliip. In other words, each comparison has 64 aliases: in the case of ten 
of the comparisons, only one of the aliases is a main effect, and for the remainmg 
five comparisons the aliases are aU interactions of at least two factors. The 
actual design may be written down by finding combinations of the letters which 
have the same number of letters in common with the unit element and the six 
three-factor interactions. These are themselves a group consisting of all com¬ 
binations of unity and four combmations of letters. The sixteen combinations 
with an even number of letters in common with all the members of the identity 
group are found to be the following; 


(1) 

abdef, 

acefl, 

bcdl, 

abfgkl, 

degkl, 

beegk, 

acdjgh, 

fgh, 

etbdegh, 

aceghl, 

bcdefghl, 

ahhkl, 

defhkl, 

bcefhk, 

aedhk 


The estimation of effects from the results of the sixteen weighings is particu¬ 
larly easy, the weight of object a will be one-eighth of the difference between 
those weighings containing a and those not contaming a. There are ten such 
contrasts which estimate the effects, and the remaining five contrasts may be 
used to obtain an estimate of the experimental error. If o-“ is the variance of 
each weighing, the variance of the weight of a, that is, the effect A will be (1/8 + 
l/8)(r* — (l/4)(r°. The precision can be increased fourfold in the weighing prob¬ 
lem with a chemical balance by interpreting the absence of each letter as the 
placing of the object in the left hand pan and the presence as the placing of the 
object in the right hand pan. Each’effect will then measure twice the weight 
of the corresponding object and the estimated weight of each object will have a 
variance of <rVl6; that is, the same precision as if each had been weighed by itself 
eight times in each pan, or sixteen times in all. 

7. General case. The rules by which fractional designs may be constructed 
have been exemplified above and the procedure is simple, though laborious in the 
case of a large number of objects. It does not, therefore, seem worth while to 
enumerate particular designs for the weighing of particular numbers of objects. 
A general procedure in considering the design for a particular problem is as 
follows. Taking the case of a number n of objects, the experimenter should 
form a rough idea of the order of magnitude of the experimental error, o- say, and 
decide what accuracy he requires for his estimates of the weights, a standard 
error say of s. Then if he weighs 2” combinations of the objects, the standard 
error of the estimate of each weight will be in the case of the chemical 

balance This serves to determine 2”'^ and therefore p, and it is then necessary 



244 : 


O. KEMPTHOE.NE 


to design a 2" experiment of fraction 2""'’. Alternatively, a design of higher 
fraction which can provide estimates may be replicated the corresponding number 
of times. In the case of the spring balance the corresponding standard error ia 
2 -(ji-i)/ 2 ^ necessitating a design of higher fraction. 

Designs of the type described above have some useful properties: 

(1) the design automatically takes care of any bias in the balance, 

(2) the effects or weights may be computed easily as indicated above, 

(3) the effects or weights are uncorrelated, 

(4) all the effects are measured with the same precision, and 

(5) an estimate of the experimental error which is independent of the effects 
may be computed from the results 

In considering the use for a particular problem of a design like the one discussed, 
it is important to understand completely the structure of the design. Such 
designs may well have real value for the weighing problem, but it is not easy to 
visualize other problems for which they would not give results capable of various 
interpretations. The use of the above designs depends on an assumption that 
interactions between pairs of factors are negligible, and this is generally not the 
case, for example, in biological research work, in which the experimenter may well 
be confident that interactions between three or more factors are negligible. In 
the particular case described in detail, there are only fifteen independent com¬ 
parisons between the sixteen results which will be obtained, and it follows from 
the identity relationship that the comparison giving the effect A, also measures 
the two factor interactions BC, DL, and FH. If therefore the factor a has no 
effect and there is an interaction between factors b and c or the other two pairs 
of factors, the experimenter will conclude that the factor a has an effect. It is 
clear that under these circumstances the experimental results are worthless. 


8. Efficiency of designs. In [2], Mood states for optimum designs such that 
when N weighings are made, the variance of the estimates of the weights are of 

^ * 2 ( 7 * 

the order of iti the case of the chemical balance and iii the case of the 

spring balance, where ia the variance of a single weighing. As indicated above, 
when iV is a power of 2, the fractional factorial designs result in the same 
variances. These designs are similar to those proposed by Kishen [6]. 

Mood dealt with the case in which the number of weighings was restricted, 
for example to be equal to the number of objects, and defined a best design as that 
which gave the smallest confidence region in the p-dimensional space for the 
estimates of the p-weights. 

To take an example for the weighings of 3 objects with a spring balance with 
no bias he suggests the following design: 

(I 1 0\ 

x = \l 0 1 


\0 1 1 


where the rows of the array refer to weighing operations and the columns refer 



THE WEIGHraG PEOBLEM 


245 


to objects. If the results of the weighings are 2 / 1 , 2 / 2 , and 2/3 respectively, the 
estimates of the weights bi, 2 ) 2 , and ba are given by the equation 

/2)i\ / 1/2 1/2 -l/2\ /2/A 

62 = 1/2 -1/2 1/2 2/2 

W \-l/2 1/2 1/2/ \yj. 

If 0 -“ is the variance of a single weighing, then the variance of each estimate will 
be [(1/2)^ 4- (1/2)^ + ( —1/2)V° = (3/4)o-*; or rf Af (= O(mod 3)) weighings are 
made by replicating the above system Al/3 times, the variance of each estimate 
will be Qo-V^iV. The covariance of any two estimates is (—l/4)o-^ so that the 
square of the correlation between any two estimates is —1/9. The fractional 
factorial design will yield estimates which have a somewhat higher variance, 
namely ia'^/N. This increase in precision obtained in Mood’s design has been 
obtained at the expense of obtaining correlated estimates which in addition are 
subject to any bias which the measuring instrument may have. It is doubted 
whether the use of such designs for any practical problem can be justified. 

It is of interest to note that the concept of fractional replication may be ex¬ 
tended to give designs requiring a number of weighings other than a power or 
two. IJor the weighing of 3 objects for example, a factorial design of fraction 
3/4 could be used: it could consist of a half-replicate based on the identity I = 
ABC, and a quarter replicate based on the identity 

I = A = BC = ABC. 

There is, however, little point in discussing such designs for the weighing problem, 
as their efficiency as measured by the total number of weighings needed to achieve 
a particular degree of accuracy is lower than for the designs described in this 
paper. 


REFERENCES 

[1] HaeoTjT) Hotelling, "Some improvements in weighing and other experimental Tech' 

niques”, Annals of Math Slat, Vol 16 (1944), pp 297-306. 

[2] A M. Mood, “On Hotelling’s weighing problem”, Annals of Math Slat., Vol. 17 

(1946), pp 432-446 

[3] R. L Plackettand J P.Burman, "The design of optimum mnltifactorial experiments”, 

B%omelriha,Yol. 33 (1946), pp 305-325. 

[4] P. Yates, “Complex experiments”, Roy Slat Soc. Jour. Suppl., Vol 2 (1935), pp. 181- 

247. 

[6] P Yates, “The design and analysis of factorial experimoiita”, Technical Communica¬ 
tion No. 36, Imperial Bureau of Soil Science, Harpenden, Herts, England, 1937. 

[6] K. Kishen, “On the design of cxpoiimonta for weighing”, Annals of Math Sial., Vol. 

14(194B),pp 294-301. 

[7] D, J. Pinhby, “The fractional replication of factorial arrangements”. Annals of Eu¬ 

genics, Vol, 12 (1945), pp. 291-301, 

[8] D, J Finney, “Recent developments in the design of field experiments HI. Frac¬ 

tional replication”, Jour of Agnc Sci., Vol 36 (1946), pp 184-191. ^ 

[9] 0. Kemptiiobne, “A simple approach to confounding and fractional replication 

in factorial experiments", Biomelnka, Vol 34 (1947), pp. 255-272. 



MULTIPLE SAMPLING FOR VARIABLES 
By Jack Silbee 
Roosevelt College 

Summary. A multiple (sequential) sampling scheme designed to test certain 
hypothesis is discussed with the following assumption: A is a random variable 
with density function P{x) which is piecewise continuous and differentiable at 
its points of continuity. Formulae are derived for the probability of acceptance 
and rejection of the hypothesis and for the expected number of samples necessary 
for reaching a decision. These formulae are found to depend on the solution of 
a Fredholm Integral equation. Explicit solutions to the problem are obtained 
for the case when P{x) is rectangular by reducing the fundamental integral 
equation to a set of differential-difference equations. Several examples are 
given. 

1. Introduction. A multiple sampling scheme is here proposed which is 
based on cumulative sums of random variables. Bartley [1] has developed a 
theory of multiple sampling for attributes when the attribute can take only two 
values with probability (p) and (1 — p) respectively. Formulae are there de¬ 
rived for the probabilities of acceptance and rejection of the null hypothesis and 
for the expected amount of samplmg necessary for reaching a decision. In 
this paper the same type of formulae are developed for the case of variable samp¬ 
ling where the underlying probability law for the variable is given by a piecewise 
continuous function for which derivatives exist at its points of continuity. 

The whole theory of multiple sampling is closely related to Wald’s [2] theory 
of sequential tests. The fundamental difference is that in the latter, probabil¬ 
ities of errors of the first and second lands are assigned, and acceptance and 
rejection criteria derived therefrom, while in the former the problem is solved in 
reverse order. There the acceptance and rejection criteria are assigned, and 
probabilities of eventual acceptance and rejection derived. For different param- 
meter values, these are the probabilities of making errors of the first and second 
kinds. 

The problem presented here is similar to that given by Wald [3] in his paper 
on cumulative sums In the present paper we waive the restriction that the 
expected number of items necessary for termination of the cumulatmg process 
be given explicitly as an integer. Since the theory given here is from the point 
of view of multiple sampling, the language appropriate to that theory will be 
used. 

2. The sampling scheme. Let X be a random variable with probability 
density function P{x) which is piecewise contmuous- One variate, say xi, 
is selected and if a:i > b, the hypothesis (for example the null hypothesis with 
respect to the mean) is accepted, and if xi < a, the hypothesis is rejected. If, 

246 



MULTIPLE SAMPLING 


247 


however, a ^ xi < b, another variate X 2 is selected In the latter case similar 
criteria with respect to Xi + Xi determine whether the hypothesis is to be accepted 
or this method of sampling continued. Or more formally, let 

f 

Sr — X, (r = 1, 2, 3, ■ • ■), 

where the cumulative sums Sr are formed sequentially as follows for any integer 
r the cumulating process is terminated by acceptance of the hypothesis if £ir > b 
and rejection if Sr < a, but, if a < jS, < b another variate 3 ;,+! is selected and the 
sum ^r+i formed. The acceptance and rejection criteria are then applied as 
above. No attempt is made here to indicate the choice of the acceptance and 
rejection criteria. 


3. The probability of acceptance. If at the rth unit the hypothesis is neither 
accepted nor rejected, then it must be true that a< Sr <h. Let us denote the 
probability that this condition holds by 


(3.1) 


f FriSr) dSr , 


where 7r(*Sr) is the probability density function for Sr in the above described 
sampling scheme The probability density function for (Sr+i would then be given 
by 

(3 2) F,+l(6Vw) = f Yr(Sr)P(SrH ~ Sr) dSr . 

*'0 

The probabilities of accepting or rejecting the hypothesis on the rth trial are 
respectively 

(3.3) YriSr) dSr , ^ Yr(Sr) dSr , 

V J—CO 

and therefore the probabilities for eventual acceptance or rejection are given 
by 

(3.4) P^ = Z r YriSr) dSr , Ph = Z f" Yr(Sr) dSr . 


The probability that a < Sn £ b cannot exceed the probability that a < Tn = 
ail + .Tj! + sia + • • • + a!» < b on a single sample of n variates, that is Pr (a < 
Sn <b) < Pr {a < Tn < b). For distributions with positive variance, it can 
be shown that the right member of the above inequality approaches zero asn—>0. 
Therefore, the process of sampling as outlined above will eventually lead to 
acceptance or rejection of the hypothesis. See Wald [3, p. 284] for a directproof 
that the probability that the left member of the above inequality holds for 
n = 1, 2, 3, • • • is zero, 



248 


JACK SIUBER 


Consider the linear integral (Fredholm) equation 


(3.5) 


fb 

Y{x) - Fi(a:) +X P(x - s)F(s) 


ds, 


where Fi(a;) = P{x) and assume a solution of the form 

(3.6) F(2) = Yi{x) + XF,(j:) + X’F3(x) + • • • . 

That solutions, in power series in X, of the Fredholm equation exist when the 
kernel Fix - s) and the function Fi(a:) have finite discontinuities is well known 
and the theory has been expoimded by several authors. (For example see 
Goursat [4].) If the power senes in X is substituted in the integral equation we 
obtain 


Fifrc) + XFj(s) + xY3(a:) + 


(3.7) 


= Fi(a:) “h X J [Fi(s) -|- XF 2 (s) -f" X* Fa(s) + ■ ■ ■]P(ir s) ds 
= Fi(a;) + X f' ri(s)P(x - s) ds + [ Y 2 (s)P(x - s) ds 

+ X’ [' F,(s)P(x -s)ds + ■■■ 


Equating coefficients of like powers of X we see that 

(3.8) F,(x) = f YUs)P{x -s)ds, (r - 2,3, ■ ■ ■). 

This, however, is the probability distribution for fSr, r = 2, 3, • • • under our 
sampling scheme, and therefore from the equations, 

(3 9) F(:c) = i; Yr(x} = Y^ix) + X f' P(x - s) F(s) ds, 

r-l Jo 


we have that the probability of eventual acceptance for X = 


(3 10) 


£ F,(S.) dsr = Y(x) dx. 

r=.l Ja *'ft 


1. 


is 


Thus our problem of finding a formula for the probability of eventual acceptance 
01 rejection of the statistical hypothesis under the above sampling scheme re¬ 
duces to that of finding a solution of a linear integral equation. 

The argument in this section has, of course, been entirely formal. However 
from the general theory of integral equations we know that the series solution 
(3.6) converges uniformly for X < 1/M{b — a) where P(x) < M, since P{x) 
is a piobability density function. In equations (3.4) and (3.10) we give solutions 
for X = I and of course we assume that M{b ~ a) < 1. Since (3.6) is uniformly 
convergeni, the interchanges of integration and summation in (3.10) and (4.3) 
in the following section are allowable. 



MULTIPLE SAMPLING 


249 


4. The expected amount of sampling. Since 


(4.1) 


f Yr-l(S,.i) dSr-1 

Ja 


is the probability that the rth sample will be reached, then the probability that 
on the rth sample, the hypothesis will be either accepted or rejected becomes 

(4.2) f' n-l(5r-l) dSr-1 - f" VriSr) dSr , 

that is, the first term in this expression gives the probability that no terminating 
decision is made on the (r — l)st sample and the second term gives the proba¬ 
bility that a like decision is made on the rth sample. The difference of the 
two then gives the probability that a terminating decision (acceptance or rejec¬ 
tion) will be made on the rth sample. The expected number of units sampled 
ivill therefore be 

E = 1 - f P(x) -b i: r r f' n-i(-Sr-i) dSr-i - f T,(S.) dSr'] 

(4.3) = 1 + L f' YriSr) dSr = l+f Z Yrix) dx 

rmal ''o 

= 1 + f" Y(x) dx. 

*'a 

Thus, the amount of sampling expected before a terminating decision is reached 
also depends upon the solution of the integral equation. We proceed to discuss 
the problem when F(x) is given by a rectangular distribution. 


5. Reduction to differential equations when P(x) is rectangular, 
the integral equation ® 


(5.1) Y*(z) = F*(3) + X F*(z - i)Y*(i) dt, 


Consider 


where 


(5.2) 


- b. 


— c<z — a< -be. 


= 0, z — a > c or z - a < -c, 


and in the integral equation 

a-ba — c<2<5'ba! — c. 

The parameter a is restricted to the values — c < « < c for the following reasons 
The rejection criterion a cannot be greater than c -b a for, if so, rejection will be 
automatic on the first sample. Similarly the acceptance criterion b must be 
greater than -c -b a for otherwise, acceptance would be automatic on the first 



250 


JACK SIIiBEH 


trial. If a > c then, rejection can never take place if it does not take place on 
the first trial for in this case all 2 > 0. Similarly, if a < c then, acceptance 
can never take place if it does not take place on the first trial for in this case all 
2 ' < 0. Furthermore, in obtaining solutions of the integral equation, we will 
take a to be > 0 This inequality is no real restriction since solutions for nega¬ 
tive a can be obtained by symmetry from the solutions for positive a. 


If we let a; = 2 — «then 


(5 3) 

Y*(x + a) = P*(x -h 

a) -f X / P*ix + a - i)Y*{t) dt, 

or 



(5.4) 

y(n;) = P(x) -h X J P(x - i)Y*(0 dt, 

where 



(5.5) 

II 

a., 

, —c < K < -f c; 


= 0. 

X < —c or a; > -fc. 

Now let s 

= t — a, then 



Y{x) = P(x) -h X j 

f P(x — a - s)y’*(s -1- a) ds 

(5,6) 


a—O' 

(•6—« 


a 

P{x — a — s)Y(s) 

a—« 


ds. 


We have thus transformed our equation to one in which P(x) becomes sym¬ 
metrical ivith respect to the line .r = 0. Furthermore, the probability of accept¬ 
ance becomes ® 


(5.7) 


Pa = 



dx, 


and the expected amount of sampling becomes 


(5.8) 


E = 1 + Y(x) d% 

a 


Also, X now has the following range: a — c<x<b + c. If we now make the 
transformation x — a ~ s = y, then 


(5.9) 


Y{x) = P(x) + f P(y)Y(x - a - y) dy, 


and the following cases present themselves. 

If a; - a < -Q or X - b > then Y(±) = P(x), since P(y) s 0, 



MUimPLB SAMPLING 


lix-b < 

—c<x — a< +c, then 

-v t^T^a 


(5 10) 


1 

1 

where 


a — c<a3<aH-c 

when b — n > 2c, 

(5.11) 


a — c<x<b — c 

when h — a < 2c. 


If .-c — & < —c < +c < a: — a, then 
(5.12) FC-k) = Pix) +hj^^Y(x- a- y) d,j, 

where 


(5.13) a+c<x<b — c and b — c > 2c. 

If—c<x — b<x — a< +c, then 

(514) F(.x) = P(x) + 7 r Y(x - oi - y) dy, 

where 

(5.16) b —c < X < a +c and b — a < 2c, 

li-~c<x — b<+c<x~a, then 

(5.16) Y{x) = Fix) + A r Yix- a- y) dy, 

2c Ji-i 

where 


(5.17) 


h — c<x<b-\-c when b — a > 2c, 

a + c<a:<t + c when b — a < 2c. 

Transforming back to the variable s, we have for the case b - a > 

Yix) = Fix) + ^ / 3^(s) ds for a — c<x<a + c, 

ZC va—a 

(5.18) = Fix) + / Y{s) ds for a + c < x < b - c, 

ZC •>x—a—e 

= F(x) + A f T(s) ds for 5-c<a;<!) + c, 

2c Jt—ai—c 


and for the case b — a < 2c, 


\ ^*—0+0 

Y(x) = P(x) + jj- / l'(s) ds for a — c<x<b — c, 

2C Ja—<r 

(5.19) = P(x) + A f I'(s) ds for h — c < x < a + c, 

ZC Ja—o 

= P{x) + A f F(s) ds for o + c < x < b + c 
ZC vx—a—c 



252 


JACK SILBER 


In all of the above equations, the integral is a continuous function of x, a, a, h, c 
while P(x) has a discontinuity a,tx = +c and x = —c, the jump at these points 
being of amount l/2c. The function F(a:) will therefore be such that 


(5 20) 


y(-c + 0) - Y{-c - 0) = l/2c, 
Y{c - 0) - F(c + 0) = l/2c. 


If we now differentiate the above sets of integral equations with respect to x we 
obtain the following sets of differential-difference equations for the case \ = 1. 
If fa — o > 2c, 


Y'{x) =ijix-a + c) 


for a — c<x<a+c, 


(5.21) 


= (7(a; — a -t- c) — Y(x — a - c)} for o + c < .t < fa — c, 

2c 


= — ^ Y(x — a — c) for fa — c < .^ < fa -f c, 

2c 

and, if 6 — a < 2c, 

Y'{x) = ^ — a + c) for a — c < x < fa — c, 


(5.22) = 0 


for fa — c < X < a + c, 


= — i yCx - a - c) for a -b c < .r < fa -f fa, 

2c 

the derivatives holding for all points except at .x = — c and x = -j-c. 

Although a technique has been devised to solve the above equations for finite 
a and fa, mathematical difficulties of a computational character are encountered 
when (fa — a) is made large. Note that there are only three essential parameters 
in the above problem since c can be taken as the unit of measurement In the 
teclmique illustrated by the following examples, a has been fixed as has (fa — a), 
i e. the solutions shoum m the examples below are general only insofar as one 
parameter is concerned The essential feature of the technique is that the range 
of y (x) has been further subdivided so as to make its points of discontinuity end 
points of subdivisions of its range, and thus Y{x) becomes continuous from; the 
right or left in every subinterval of its range. 

6. Example I: fa — a = 2o, c = 1, a = 0. In this case x ranges from (a — 1) 
or (-c), whichever is smaller, to (fa -|- 1) or (-j-c), whichever is larger, If 
—c < 0 — 1, then y(x) = P{x) for —c < x < a — 1, and if fa + 1 < -f c, 
then y(x) = P(x) for fa -b 1 < X < —c. For x between o — 1 and fa + 1 the 
domain of the differential-difference equations is divided as follows, where a 
is now restricted to the inteiwal — 1 S a ^ 0. 



MULTIPLE SAMPLING 


253 


(6.1) Y\{x) = iF,+ 2 (a: + 1) where for i = I, o-l<x<-l, 

i = 2, — 1 < X < a, 

4 = 3, 0 < a: < 0, 

4 = 4, 0 < .T < a + 1; 

y'X^) = - — 1) where for 4 = 5, o + 1 < s < +1, 

4 = 6 , +l<a;<a + 2, 

4 = 7, fl + 2 < a; < +2, 

4 = 8 , 2 a; < <i -|- 3. 

The above are the equations corresponding to (5.21) for the given example 
Differentiating the equations for i = 3, 4, 5, 6 and making certain obvious 
substitutions we obtain the followmg second order differential equations, 

(6.2) Y"ix) = -iy,(a:), 4 = 3, 4, 5, 6 , 

where the domains for x are as in (6.1). If we solve the equations (6 2) and sub¬ 
stitute in the remaining equations in ( 6 . 1 ) we obtain the following set of equa¬ 
tions, 

Y,(x) = A,+z sin §(a! 4- 1) - S,+s cos J(a! + 2) 4- /f„ i: = 1, 2, 

(6.3) y<(a:) = Ai cos lx -f B, sin lx, 4 = 3, 4, 5, 6 , 

Yi(x] = —At^i sin i(a: - 1 ) 4- .B ,_2 cos i(a: - 1) 4- , 4 = 7, 8 , 

where again the domains are as in ( 6 . 1 ) 

From continuity considerations we have the boundary conditions 

Fi(a - 1) = y,(a + 3) = 0, yx(-l) - I = y 2 (-l), y 2 (a) = y 3 (a), 

y2(0) = y 4 ( 0 ), y.(n -h i) = y.{a 4-1), n(i) = yo(i) 4- h 
y.(a + 2) = yr(o -h 2), yr(2) = y,(2). 

These boundary conditions yield certain relationships between the constants 
The equations so determined, however, do not form a consistent set of linear 
equations in the .4j, 5,, . If we integrate out the equations (5 18), 

sectionally, the following relationships between the constants are obtained. 

vlj = A,+ 2 sini - B.+ 2 Cos^, 5; = B ,+2 cos J 4 -.Bi +2 ein 5 , 4 = 3,4, 

Ki - -(Ai - As) sin ^(a 4- 1) - {Bi - Bt) cos i(a 4 - 1 ) 

= i Bi ~ Bi Ki, 

K^ = As - ^4 4- li's, ‘ Ks = A« sm l{a 4 - 2 ) - 5,, cos |(a 4- 2), 

(6.4) 

£3 = ^ 4" .Br 4~ 7^1 “ A 4 4" As 4~ (A 4 ~ As) sin |(a 4" 1) 

4- {Bi - Bi) cos Ka + 1), 

A^ = Aa 4- Ki + (As - As) sin l{a 4- 1) 4- (.^6 - Bi) cos Ko 4- 1), 

= - As sm ^ 4-Bs cos ^. 



254 


JACK SILBEH 


From these equations it is easily seen that Ai = Ai and Ki = K 2 = Ki = IQ . 
Furthermore, the following set of consistent linear equations is obtained, after 
several simple manipulations and substitutions. 


sin + 2 ) + sin^ ■ sin ^}Ae 

A 


- jcos Ba + jcos Ka + 2 ) + sin - cos §j> Ba = 0 , 


(6.5) < - sin + 2) + cos K® + 2) — cos ;r • sin ilo + < sin 5 f .Bs 


+ <sin ^(a + 2 ) + cos Ha + 2 ) — cos j; ■ cos |> Be = 0 , 


{cos Ae — B 3 + {sin Be = 0 . 


All the other constants can be obtained from the solutions for Ae, Bj, Be in 

(6.5) Letting A equal the determinant of coefficients in (6.5) and using the 
relationships (6.4) we obtain the following solutions: 

A = 2 — 2 sin f — cos I, 

AAi = ^(cos i - cos a /2 • sin i(a + 1 )) = AAj , 

AB 4 = ^{sin J — sin a /2 • sin ^(a + 1) + cos ^ ~1), 

AAe = ^{sin 1 — cos 5 + cos a/2 • cos i(a + 2 )}, 

( 6 . 6 ) ABfl = |{sm ^(a + 2 ) cos a /2 — sin ^ — cos 1 ), 

= i{l “ sin a /2 • sin + 1 ) — sin J), 

AAe = Mcos 5 - sin’ j(a + 1 )}, 

ABe = |{sin ^ + sin j(a + 1 ) • cos |(o + 1 ) - 1 }, 

Affi = 1 |cos ^ sin J(a 4- 2)| = AIQ = AIQ = AKg. 

If we now integrate Y(x), equation (6.3) seetionally, i.e. from the left end point 
to the right end point of each sub-interval of its range and then add up appro¬ 
priate areas, we obtain the following formulae for the probabilities of acceptance 
and rejection and for the expected amount of sampling: 

B« = ^ jcos Ka + 1 ) 4- sin a/2 — cos a/2 4 - AKsj, 

B.i = i {2 - cos — 2 sin I 4 - sin J(o 4 - 1 ) 

— cos Ko 4 - 1) — sin a/2 4 - A/Cj), 
B = ^ {cos a/2 — 2 sin a/2 — siii K® + 1)). 


(6.7) 



MULTIPLE SAMPLING 


255 


7. Example II: a = 1, c = 3, 5 — a = 4. In this case, aa in the previous one, 
Y(x) = P{x) toT — 3<x<a — 3 when a — c = a — 3< —3 and if b + c = 
a + 7 < 3 then Y(x) = P{x), a + 7 < x < i. For a — 3 < a; < a + 7 where 
a takes on only integral values between —5 and 3, we have the following set of 
differential-difference equations : 

YU,(x) = ^Y,+,+,(x + 2), j = -3, -2, -1,0, 

(7,1) =0, j= 1.2, 

= - iI^a+,-4(a: - 4), j = 3, 4, 5, 6. 


If we integrate the above equations for j = 1, 2, substitute in the equations for 
J = —1, 0, 5, 6, integrate, and then substitute in the remaining equations, we 
obtain the solutions 


(7.2) 


= Aa+3+i(x + 2)^ + X + Aa~3 , 

j = 

-3, - 

“ 'i‘-^a+j4-2 -^a+j 

J = 

-1,0; 

= Aa-^-j 

J = 

1,2; 

= — s(iC 4) 5j4a-j.3_4 X “|- > 

j = 

3,4; 

” X “1“ Aa-\-3 ) 

J = 

6, 6 


As in the previous example we now use (5 22). Integrating out (5.22) sec- 
tionally, certain relationships between the A*+.,■, j = —3, — 2, • • • 6, are ob¬ 
tained. These yield 

.''lo+i = ^(12Pfl_i -j- 12Pa -f 39P«+) -4- 9 P 0 + 2 ), 

A«+2 = uV(12P„_l + 12P„ -h llPa+l + 37P„+2l, 


{4P<.-i + 4P. + 13Pa+i + 3P 

66 


0+2 J 


(7.3) 


+ ^\22SPa-i + 60P„ + 55P.4.1 + 17Pa+2), 


^ {12P„_i + 12P. -h llPa^i + 37P,+2) 
luo 


+ + 228P„ + 55P„+1 + l7Pa+2). 

where Pa+, is the value of P(x) for o -f j < a; < a -f- i + 1, i = —3, • 6. 
All of the other constants can be found m terms of those given in equations 
(7.3). If we now integrate (7.2) sectionally and perform several simple manipu¬ 
lations, we arrive at the following formulas: 


-a 


Ph = £ Pa+y + -^<-+1 + 


3a - 1 
216 


a+ 2 ”1” ^ 2 ^^ 12^^* 


(7 4) Pa = P»+y + 216^ ~ 


E = I + ^ - — A«41 + ^ ~ ]2 ~ ■^“+* 


12 


12 



JACK SILBER 


Altliougli Pa+f, j = -6, -5, -4,7,8,9, have not appeared in previous deriva¬ 
tions in this example, they have been inserted in the above formulas to cover the 
cases in which a - c > -c or & + c < c. 

It should be mentioned that Kac [5] obtained the distribution of n (the ex¬ 
pected amount of sampling) by a process similar to that given in this paper, It 
IS also interesting to note that the present paper could have been written entirely 
in the language of problems in Random Walk. 

The author has also worked on the case in which the distribution P(x} is tri¬ 
angular and parabolic. In these, as in the case of the rectangular distribution 
discussed in this paper for!) - a large, the equivalent differential-difference 
equations are of large orders malang the computation of solutions extremely 
tedious. As Kac [5] pointed out, the task of obtaining solutions in closed form 
for the case when P(x) is the normal law is extremely dif&cult, 

REFERENCES 

[1] W, Bartkt, “Multiple sampling n'itli constant probability”, hnals of Malh, Slal , 

YolU(m2),p 363, 

[2] A, Wald, "Sequential tests of statistical hypotheses", hnals of Malh Slal,, Vol. 16 

(1945),p. 17, 

[3] A, Wald, “On cumulative sums of random variables", Amah of Malh, Slat., Vol 15 

(1944), pp 283-284 

[4] E Gouesat, Cours d'Amli/se Malhmaiique, Gauthier, Pans, 1923, Vol 3, 

16] M, Kae, "Random walk m the presence of absorbing barnorB”, Annals of Math, Slal,, 
Vol 16 (1946), p. 62 



ON THE CHARACTERISTIC FUNCTIONS OF THE DISTRIBUTIONS OF 
ESTIMATES OF VARIOUS DEVIATIONS IN SAMPLES 
FROM A NORMAL POPULATION 

By M. ICac 
Cornell University 


1. Summary. An explicit formula for the characteristic function of the 
deviation 

-ZlX,-Xr, a>0, 

n k-i 

is derived for samples from a normal population. For a = 1 one can calculate 
the probability density function but the result does not seem to he in complete 
agreement with a recent formula of Goodwin [1], 


2. Introduction. Let Xi, Xj, • • ■ , be independent, normally distributed 
random variables each having mean 0 and variance 1. 

Let, as usual, 

X — ‘ ‘ ‘ X^" 

n 


and denote by Fn(a) the deviation 

(1) F„(a) = iZ iX*- Xr, «>0 

n f.=.i 

The purpose of this note is to show that 


( 2 ) 


Fni$) = B{exp (ffFn(a))) 


1 r r” 

= L L 


—1^/2 •/ti({|i|“+iii) 

c c 


dx 


drj. 


It is easy to check that for « = 2 one obtains the well known expression 

Moreover, if a = 1 one can actually find the probability density of F„(l) The 
resulting expression is fairly complicated and it strongly resembles an expres¬ 
sion recently obtained by Goodwin [1], Except for the relatively simple case 
n = 3, it does not seem easy to verify that our formula is equivalent to that of 
Goodwin. 

Although deviations corresponding to values of a different from 1 and 2 are of 
little practical value the explicit formula (2) may be of some interest. It is 
also hoped that the method of derivation may prove useful in other cases. 

267 



258 


M. KAC 


3. The derivation of (2). We start with the observation that 

X and FnCa) 

are statistically independent (see e.g. Daly [2]). 

Denote by 

E*{\X\<e,exp (t^Y.ia))} 

the integral of exp (ff7„(a)) extended over that portion of the sample space in 
which I X 1 < e. Thus the conditional expectation jE{exp (i^Yn{oi)) | | X | < «) 
is given by the formula 


E[exp (riX„(a)) | | X | < t) = . 

Prob ( I X I < e} 

Because of the independence of X and y'»(a) we have 

(3) jSfexp (i?Fn(«))) = ^ I < exp , 

Prob { I X I < «) 

For the sake of simplicity we assume now that a > 1 and note that 

exp mM) - exp z I Xi r) I < 1 i 11 Xk r -1X, - X r 1 

< £ (IX, I +1XI) “- 

Thus, on the portion of the sample space where | X j < e, we have 

exp i^^YM) - exp £ I XJ“) < ^* £ (| X, | + e)-* 

\n 1 / n I ' 

and consequently 

®*{ 1 X 1 < e, exp (tf7„(o!))) - I 1 X 1 < e, exp ^^ £ 1 X* |“^ 

< I 1 X 1 < *, £ (I X. 1 + e)“-‘ \ . 


Clearly E* | | X | < «, Z (I X* | + e)" ' |, approaches 0, as e approaches 0, 
hence by (3) 


(4) E {exp (if7„(a))j = lim 


F?* I I X I < e, exp £ | Xj, 1“ ^ 
= lim ——L_vl 1 / 


Prob { I X I < *} 



CHARACTERISTIC FUNCTIONS 


259 


Using the fact that 


Tve obtain easily 

E* I \ X 
( 6 ) ^ 


= 1: 

1Z 1 < 

i 1 exp {i7,X) dr, = i 

TT J—00 

1 ^ 1 = ‘: 

= 0, 

\X\> t, 

< e, exp^^ Z 1 Z, 1")| 



= i r i;/ exp 1 i: a I Z, 1“ + nXO 

T 71 I n 1 


dri. 


The justification of interchanging of the older of integration (from - to oo) and ’ 
the operation E can be made quite simply (see e.g. Kac and Steinhaus [3]). 

Notice now that 


zjexp^Z (f |Z, r + 

= L (■ l) ^ 

and that <Pn(^, n) is absolutely Integrable in (— «, «) as a function of 
Thus, as € —^ 0, 

(6) - f jf) dti ~ - f 1?) dtf. 

IT J-oo rj TT J-w 

Furthermore (as « —> 0) 

(7) Prob { I X I < 6 j ~ 26 ^ 

V 27r 

and combining this with (6), (5) and (4) we get 

(8) Z(exp (tiYn(a))} = Vn((, n) di?- 


This, of course, is equivalent to (2). 

4. Density function of the mean deviation. If « = 1 one can obtain an ex¬ 
pression for the probability density/„03) of 7n(Q:). We note first that 


L “*> (- 0 


exp - (f I X I + ija;) da; 


= 71 jf exp exp i(f + if)x dx 

-Y n j exp ^ ^ exp i(| — 7j)x dx = n\R{^ + i;) + — rj) ) 



200 


M KA.C 


where 


dri 


R{u) = J exp exp (iia;) dx. 

Using (2) (with a = 1) we obtain 

««- II [S C) “ ’’' 

Let us first look at the summands corresponding to fc = 0 and fc = n We have ‘ 

f “ ~n)di^= f R%ri) dr, = r /?”(? + v) dr,. 

J— 00 J— 00 V— 00 

'Now, R{r,) is the Fourier transform of 

10, X < 0, 

and hence R’'(r,) is the Fourier transform of the convolution 

f * f * • • * f = 


It is easily seen (using integration by parts) that 

° (m) 

for large | r, j and hence for n > 2, R"(r,) is absolutely mtegrable in ( — <», «>). 
It follows (classical inversion formula) that 


f R’^ir,' dr, = 2,rf^'‘Ho) 

V—oa 

» Bince for n > 2, ^^"^ 0 :) is continuous and f'"'(.'i:) = 0 for x <0 we 
have f'’”(0) = 0. Thus 

ifin—\ u—I / \ i»« 

" (vi^’ Si (fc) L 

It is fairly easy to check that 

+ v)R’'^\^ - v) dr, = w J exp (f{x)f^‘’' dx 

so that 




CHARACTERISTIC FUNCTIONS 


261 


Finally, 


IM 





uCW 


I 

\2/ ^ V 2 / 


I have nol been able, except for n = 3, to verify directly that this formula is 
identical with that of Goodwin. 


REFERENCES 

’ [1] H, J Goodwin, “On the distribution of the cslimate of mean deviation obtained from 
samples from a normal population,” Biometnka, Vol 33 (1946), pp 254^266 

[2] J. F. Dalt, “On the use of the sample range m an analogue of Student’s f-test," Annals 

of Math Slal.,Ya\ 17 (1946), pp. 71-74. 

[3] M Kac and H, Steinhaus, “Sur les fonctions ind6pendantes III,” Stud Math , Vol 

6 (1936), pp. 93-94. 



NOTES 

This section is devoted to hri^ research and expository articles and other short items. 


A FUNCTIONAL EQUATION FOR WISHART’S DISTRIBUTION 

Bt G. Rasch 

State Serum Institute and Univetsity of Copenhagen 

1. Introduction. The sampling distribution of the moment matrix for ob¬ 
servations from a multivariate normal distribution was given by Wishart in 
1928 [1]. This proof involved rather advanced multidimensional geometry but 
since then two analytical proofs have been given; one by Wishart and Bartlett 
in cooperation with Ingham by the use of the characteristic function [2] and a 
second by Hsu by induction with regard to the dimension of the observa¬ 
tions, [3], 

In the following section is given a new derivation of the form of Wishart’s dis¬ 
tribution in which a fundamental property of the multivariate normal distribu¬ 
tion IS utilized, viz. the invariance of the distribution type against a linear trans¬ 
formation. In section 3 the same principle is used for evaluation of the constant 
and determination of the moment matrix m the multidimensional normal 
distribution. 


2. Derivation of Wishart’s distribution. Let‘ 

(1) X = (ni, • • • , Xk), 

denote a fc-dimensional normal variate with the mean vector 0 and the distribu¬ 
tion matrix 

(2) $ = (v.-,), 

viz. 


(3) 




^ Va(^) ^ -u*** 


$ is symmetrical and positive definite. 

Now consider n observations of x; Xi, ■ • , Xb , which are stochastically in¬ 
dependent. Their joint distribution is 



The estimation of $ is based upon' the moment sums 


_ niij — , 

1 Notations. Lower case latin and greek letters are scalars, boldface capital latin and 
greek letters denote matrices, and boldface lower case letters row vectors. * means trans¬ 
position n (A) stands for the determinant of the square matrix A. 

202 



THE WISHAHT DISTRIBUTION 


263 


which form the symmetrical, positive definite matrix 

(5) M = (»n„) = Sxil'x,,. 

In order to derive the distribution of M the straightforward procedure seems to 
be to transform the distribution of the sample (xi, • • , x„) to a distribution of 
M and some other variables which then should be integrated away. As such, 
the transformation, 

(6) X, = u,M*, MSu^u, = 1, 

might serve. The matrix 



contains nk elements linked together with relations; (U) .symbolizes 


n — 


k d" J 


k of the elements taken as independent variables 


For the iiurpose of introducing M m the exponential term in (4) we shall 
define the 'Vlouble dot multiplication” of two matrices: 


( 8 ) 


A ■ B = (a.,) 


(U == Z) L 

(O (.)) 


for which we notice the rule 


(9) A • • (BCD) = C • • (B*AD*). 

As obviously 

x^x* = "SipijXi-v, = # •' (x*x), 


we have 

( 10 ) 

and accordingly 

(11) p{M, (U)j 


2x,4>xr = # • • M, 


/Va($)Y 

-k'- M 

a(xi, • , x„) 


c 

m, (u)) 


where —^ denotes the jacobian of the transformation. On mtegrating with 

) 

respect to (U) we obtain 

(12) p(Ml = (-v/Mi))” ■ 

where <p(M) is independent of $. From this it follows that p{xi, • ■ • , Xn | M} 
is independent of $, i.e. M is a sufficient statistic for 4>. 

In order to determine the mathematical form of ip(M) we shall apply an arbi¬ 
trary linear transformation to the original variates: 

( 13 ) ' X, = x^A. 



264 


Q. RASCH 


The new variates xj are obviously nonaally distributed about 0 with the dis- 
tributiou matrix 

(14) = A$A*. 

Therefore the distribution function of the new moment matrix, given bjf 

(15) M = A*M'A, 


is 


( 16 ) 


p(M'} = (Va($ 0)” ■ « “'v(M0. 


On the other hand the transformation from M to M' is a linear one, tire jacobian 
of which therefore is a constant depending on A only: 


(17) 


3(M) 

a(M') 


= rp{k), say. 


Consequently, 

(18) p{M'} = ■ e"**' “ • v(M) • 

The two expressions for p{M') must be identical, and as 

(19) A(<hO = A($)A'(A), 


and 


(20) • • M' = (Aik*) • • M' = (A*M'A) • • $ = M • • 
it follows that ^(M) satisfies the functional equation 

(21) |A(A)1VM') = p(M)- |^(A)|, 

Now, since the transformation M = (AB)* M'(AB) may be earned out in two 
steps, ^(A) also satisfies a functional equation 

(22) i/-(AB) = ^(A)^(B). 

Purthennore, if A is a diagonal matrix it is easily seen that 

(23) V'(A) = (A(A))*«, 

and this relation holds generally. In fact, considering the case where the normal 
form of A is a diagonal matrix: 

A = TDT”^, say, 

we get 

^(A) = ,^(T)^(D)^(T-^) 

= (A(D))*+' ^^(TT-‘) 

= (A(A))‘« 

and by analytical continuation this is seen to be true for any A. 



THE WISHAET DISTBIHUTIOE 


265 


Now, inserting this result in the functional equation (21) and taking for A the 
real symmetrical square root of M so that"' M' = 1, we readily obtain the 
solution 

(24) ' <p(M) = (A(M»))’‘-"-^ ^(1). 

It follows that 

(26) p(M) = 7i(w)(A(5.))"'^ . e"**"** • (A(M))‘"-‘-‘>'^ 

where yh(n) = <p0~) is a constant which may be determined m various ways (cf, 
for instance Cram6r [4]). 

3. Other applications of the linear transformation. It may be noticed that 
the linear transformation also leads to simple derivations of two fundamental 
properties of the normal multivariate distribution itself, viz. determination of 
the constant and the relation between the moment matrix and the distribution 
matrix. 

Let 

(26) p{xl = y(^) . 
and transform by 

(27) X = x'A. 

The new variable obviously has the distribution matrix (14) and the constant 
But on the other hand direct transformation of (26) leads to 

P(x'} = 7($) • 6-****’ • 

= T($)|A(A)|e-*^*^-, 

and therefore we must have 

7(®') = 7(^) I A(A) 1. 

For A = we get 4>' = 1 and consequently 

7(4’) = Va($) • 7(1), 

where obviously 

= (vfcr- 

Considering 

M(4>) = j x*xp{X} dx, 

® Exists because M is positive definite. Let' M = ODO* where O is orthogonal and D 
the diagonal form of M, then Ml = ODIO* la real and symmetrical 




266 


HERBERT ROBBINS 


the shme transformation gives 

M(f) = J A*x*xAp(x'l di', 

= A*M($')A 

which for A = leaves us with 

M($) = ($')"' * 

because M(l) = 1. 

REFERENCES 

[1] J. WiSHAHT, “The generalised product moment distribution in samples from a normal 

multivariate population", Biometnka, Vol. 20A (1928), p. 32. 

[2] J. WisHABT AND M S Baktlett, “The generalised product moment distribution in a 

normal system". Proa. Cambr. Phil Soc., Vol, 29 (1933), p. 260. 

[3] P. L. Hsu, “A new proof of the joint product moment distribution". Proa, Cambr Phil 

8oc., Vol. 35 (1939), p. 330. 

[4] H, Chau£r, Mathematical Methods in Statistics, Princeton Univ. Press, 1946, pp. 392-93 


THE DISTRIBUTION OF A DEFINITE QUADRATIC FORM 

By Herbert Robbins 
University of North Carolina 

Let Ji, • • • , be independent normal variates with zero means and unit 
variances, let Oi, ■ • • , a„ be positive constants and define 


( 1 ) 


= + +^xl, 

F„ix) = Pr [U„ < x], Ux) = FUx). 


( 2 ) 

Setting 

(3) a = (ai • • • Oaf’' 

and using the convolution formula we may show by induction that for a: > 0, 

(4) 


( 6 ) 


h—o r(^ -j- /c) 

F„(x) = t 

*r» Viin + fc + 1) ’ 


where for fc = 0,1, • ■ • 

*1+ • 1-11 * ' • \ 0,1^ • • • djl* 


(6) 



DISTRIBUTION OF A aUADRATIC FORM 


2G7 


In particular, if Ui = • • • = a„ = 2, then uamg the Imown distribution of x with 
n degrees of freedom we have 


m = 


2i’‘r(^n) 


^ ^ ^4„-i ^ e,(-a;)‘ 

2J’‘rQ7i) 2®/cl 2i“ r(§7i + /c) ’ 


so that 

_ r(5a + /c) _ TT r(ii +§)■■■ r(fn d- 2) 

‘ “ 2* /c! r (^n) 2* .,+ f11. ■ • f„' 

and therefore 

, . y' r(fi 4~ i) • • *r(fn d~ I) _ r(f?2. + /c) 

ii!---in! /c!r(|n) ' 


Now in the general case let 


(8) 


a = min {ai, • • • , On}; 


tlipn from (6) and (7) we deduce that 


(9) 


Ck(—x)’‘ 

r(4a + k) 


<SeML 

~ r(in)kl ’ 


with a similar inequality for the general term of (5). 

It is difficult to evaluate numerically the coefficient ci by a direct application 
of the definition (6). We shall therefore give a method by which the Ck may be 
computed easily. We shall assume in what follows that the a, are distinct. 

Let Fi, • • • , 7n be another set of variates with the same joint distribution 
as the Xi and independent of the Xi, and set 


( 10 ) 

( 11 ) 


Ctl v-2 I 
+ 




+ ?^» + 2 


. a» y4 


Gi„ix) = Pr [Fan < x], g 2 „(x) = (?;„(x). 


Then by the convolution formula, 

-I « (• * ' 

(12) ginix) = / Uix - y)fM dy = 2 \ Z) c.c*_. 


(-X)* 

V[k d- n)' 


But W 0 may show directly that, setting 

(13) g. = n (% - 
we have 

(14) g^,{x) = (-l)"-^ t q^ar^e-'^ = (-1)'*-^ Z {£ g.a? 

A'*"® (, 


(i — 1, ' ^) Ti)j 




/cl 


Equating coefficients in (12) and (14) we derive the fundamental formula 

(16) t (fc = 0,1, • • ■). 



268 


HERBIOBT ROBBINS 


Thus, defining 

(16) 2Pk = a” . 

t-1 

we may write 

(17) E c.c._. = 27^1. 

i-O 


From (6) or (17) it follows that 


(18) c„ = 1. 

Thus we may solve (17) successively for the cu in terms of the Ft : for 'j = 0,1, 


(19) 


Csj = P2; — I Cl C2J-.1 + Ci C2j_2 + * * * 
C2)+l = — {ClC2} + C2C23-I + 


+ C,_i C,+i + - 
• • • + C;C,+i}. 


When the n constants gi, • ■ , qn have been computed we may compute the 
Pic by (16) and then the Ck by (19) successively as far as desired. Tlie inequality 
(9) gives an indication of the number of terms of the alternating series (4) or (5) 
which are necessary to secure a desired accuracy. A sharper result than (9) 
should certainly be possible when the a, are well separated. 

The foregoing method may be modified to cover cases in which some of the 
a, are equal, the formulas (16) being replaced by the appropriate limits as the 
a< approach equality. 

We shall now derive an expansion of /„(») and Fn{x) in terms of x* distribu¬ 
tions. Let us set for .t > 0, 


( 20 ) 

or, equivalently. 


/«(x) = i{-\fdic- 


ifc=o 


^.in+1—1 

Qin+i r(|n + /c) ’ 


^in+t-1 -ix 


( 21 ) 


2^” (2 *) S ^ 2»''i-‘r(in+ 7 i;) 


(ix)^r(fn) 

2 ^^T^ a ^ ‘ ' r(|n + k )' 


where the dh are to be determined. It follows, after some reduction, that 


(22) g,„{x) = a--* .x’-' E ^ E d; 

But we may write (14) in the form 


K 

,E 

=0 I ._0 


(-31)^ 


(23) g^cix) = a-" a;”-'e -*"■ E i E aq^ ar*~*(o - a.)‘ 


a* r(|n + k) 

(_^)^-n+l 


a*~"+ir(fc -fi 1) ' 



DI8T]{IH(ITION OF A QUADRATIC FORM 


269 


Equating coefficients in (22) and (23) and setting 

(24) 2(h = a T (a - , 

t-X 

we obtain the relations, do = J and 

(25) Ed.d^. = 20, (A.'= 0,1,- •), 


from which the d, may be computed as in (19) Equation (20) or (21) then 
gives the expansion of /„(a:) in a series of frequency functions. The corre¬ 
sponding expansion of Pn(x) is then 


(26) 


Fni-v) 


00 rtfl: 

- E (-ly dK I 


Jn+i-1 -tfa 

i O T 

r(i„ q- i) 


or 

(27) 



CO aX 

= E(-l)'d, / 

^=.0 ♦'0 


^i"+t-l -tl2 

2}n+ir(i?r+T) 


dt. 


By direct compaiison of (4) and (20) we may establish the following relations 
among the C). and du‘. 


(28) 



These may be used if both the power series and the x series are desired. 
From (6) we see directly that 


(29) 


Cl 


1 ^ 


and from (28) it follows that 

(30) d\ = i mhi , 
where we have set 

(31) 5i = I i E o7" I - (ai • • • a„)"*"‘. 

Since by a well known inequality 6i > 0 it follows that d, > 0, with equality only 
if all the o. are equal. If we denote by K{x) the frequency function of 
|a(Xi + ■ ■ ■ + ^\) then 



270 


HOWARD L. JONES 


and hence (21) bccomeB 

which is significant for x near 0. 


EXACT LOWER MOMENTS OF ORDER STATISTICS IN SMALL SAMPLES 
FROM A NORMAL DISTRIBUTON 

By Howard L. Jones 
lUtnois Bell Telephone Company 

1. Summary. Exact means in samples of size < 3, and exact second momenta 
and product-moments in samples of size < 4, are given in Table 1 in terms of 
ir for order statistics selected from the normal distribution N{0, 1). The deriva¬ 
tion employs multiple integration and some general properties of the moments. 

TABLE 1 


Expected values of lower moments of order statistics, a;* > a:i+i, 
in samples of size n from the normal distribution NiO, 1). 


Moment 

w = 2 

n = 3 

n = 4 

E\xi] 

I/Vir 

3/(2v^) 


E[xf\ 


0 


El/J 

E[xl\ 

1 

1 -f •\/3/(27r) 

1 -f \/3/’r 


1 — 's/S/tt 

1 - Vs/ir 

E[xiXi] 

0 

V3/(2t) 

y/^lrr 

E\xiX3\ 


— VS/ir 

-( 2 V 3 - 3) /it 

E\xiXa] 



-~if% 

E{x2X^ 



(2 Vs - 3)/7r 

2 

CTl 

1 — I/tt 

1 - (9 - 2Vs)/iin) 


2 

<72 


1 — ''/3/v 


Vis 

1/t 

V3/(2x) 


Via 


(9 - 4V^/(4t) 



2. Introduction. The usefulness of the lower moments of order statistics for 
determining the moments of the range and for other purposes is well established. 
In small samples, however, computation of the moments by quadrature is labori¬ 
ous [1]. The values shown in Table 1 should therefore be helpful in problem^ 
requiring the use of these moments for samples of size < 4, since the constant w 
has been evaluated to several hundred decimal places. Some of the methods 
used to obtain these results may also be useful in approximating or verifying 
the moments in larger samples. 




MOMKNTS OF OliBER STATISTICS 


271 


3. Multiple integration. Let n random selections from the normal distribu¬ 
tion JV(0, 1) be arranged in order of size so that 

Xl> X 2 > ■ • • > x„ . 

For samples of size 2, the means and product-moment are easily obtained from 
the general formula 


m CO rtM 

xlx) 

'n-1 *^*2 

■ ■ ■ f(x„) dxi dxi dx„ 


~ V2t " ’ 

E [at] being the special case where /i = 0. Multiple integration can also be 
used to find any product-moment, E [a:,a:,y,], for samples of size 3, the order of 
integration being changed at any stage where necessary. 

For the means in samples of size 3 and the product-moments in samples of 
size 4, the integrals reduce to double integrals which can be evaluated from the 
equation 


L^Ju 2ab 


This equation follows from the fact that 


is equivalent to 


while the function 


r r ^ 

so ''bls/a ^ 

n dpi dpi , 

-P2 

/•ft/?/® 

f dti 

•’ll 


has the symmetrical property that </> (fc) — whence 

f 4>(h) dh — 0, 

J-oO 

4. Some properties of the moments. The most obvious property of the 
moments of order statistics in samples from the normal distribution iV(0, 1) is 
their symmetry; thus: 

E [sjJ — E [av—i+J) 

E [a^t] = E [a^n-t+i]) 

E IX,X]] = E [Sn-i+lXn-^+J. 



272 


IIOWATtl) L JONKH 


When sample values from any pareni, distribution aie numbered m order of 
random selection, and are statistically independent of each other, and 
the expected value of a product x^x'', is the product of the expected values of x\ 
and x'‘. Numbering in order of size has the effect of increasing some expected 
values and decreasing others, leaving the sum of expected values of a given type 
unchanged, so that in general, 


11—i 

L 


1-1 


i_t+i 



FArWSl 


where Xo is a random selection. In particular, this equation holds for the special 
cases (k = 1, h = 1), (fc = 1, h = 0), and (/c = 2, ft = 0), so that in samples 
from the noimal distribution JVCO, 1), 

Z Z = §n(n — l)(/i’[xo])’* = 0, 

1—1 1-^1 

Z = ‘»'F[xo] = 0 , 


Z (F[x‘j) = = n. 

tsl 

The foregoing relationships lead immediately to the evaluation of E [xiXa] and 
E [x?] m samples of size 2. (The generalization of these relationships was sug¬ 
gested by Professor John li. Smith, whose unpublished manuscript on sampling 
from a rectangular distribution has also been instructive.) 

In samples from a normal distribution, the covariance of every order statistic 
with the sample mean is the same as the variance of the sample mean. This 
implies that the variance of the sample mean < the variance of any order sta¬ 
tistic, the ratio of one standard deviation to the other being equal to the co- 
efheient of correlation between the sample mean and the order statistic. To 
derive these properties, consider the linear function 

in = lUiNi -|- WzKi "b • • ■ Wn^n 

of the order statistics Zi, Xa, • ■ ■ , X„ in a sample selected from the normal 
distribution iV(|U, cr) with unknown p and a. Let 

Xt " (X, A)/^j i — 1, 2, " *' , n. 

The conditions ici + Wa -b ■ ■ • + = 1 and = wt are sufficient to make 

TO an unbiased estimate of ft with vanance o-* E [(wixt + waXa -b • • ■ + w^n}\ 
The w’s that make this variance minimum must satisfy the equations obtained 
by replacing w, with Wn-i+i , for f > ^{n -b 1), in the expression 

E[{wiX± + WiXi ,+ •■•+ w^nY] -b \(wi "b u)2 "b • • • + Wn — 1) 

and then setting the partial derivative with respect to each w equal to zero. 
This leads to 


Z WjE[x,x,] + Z vijEixn-i^^ix^ -bX = 0, 1 < i <n, 


i_i 



AKYMPl’OTIC EXPANSION 


273 


where the summations mclude the terms E[x\] and E[xn-<,+i], respectively. But 
it is Imown [2] that the sample mean is the regular unbiased estimate of n with 
minimum variance. Setting each w equal to l/n and combining equivalent 
terms yields 

^ ii’Uir'J + JaA = 0, i = 1, 2, n. 

1—1 


Summing from i = i to i = n, and employing the relationships discussed in the 
preceding paragraph, we obtam 

n + = 0, 

whence 


and 


X = -2/n, 

n 

ifiCtTj] = 1., i = 1, 2, , ?i, 

1-1 


where the summation includes the term This equation leads to the prop¬ 

erties mentioned at the beginning of this paragraph. The same equation can 
he used to evaluate .E[a:i] and E[xl] in samples of size 3 or 4 from the distribution 
JV(0,1), after the product-moments have been found. 


REPBRENCES 

[1] 0. Hastings, Jn,, P, Mostei.leh, J. W Ttrcnr, and C. P. Winsob, “Low moments for 
small samples; a comparative study of order statistics,” Annals of Math. Stat, 
Vol. 18 (1947), pp. 413-426. 

[2J H Cbam^b, Mathematical Methods of Stahatics, Princeton University Piess, 1046, p. 483, 


NOTE ON AN ASYMPTOTIC EXPANSION OF THE nTH DIFFERENCE 

OF ZERO 

By L. Cl. IIsu 

National Tsinfi-Huu Vnivemly, Fctpmg, China 

This note gives an asymptotic expansion of the nth difference of zero. It is 
known that the Stirling number Sn,t of the .second kind is defined by 

(1) = A''o' = i:(-ir^(!).x". 

XonO \^/ 

We shall first show' that the Stirling number Sn,n+i can be expanded in the 
form 



274 


L. C. HSU 


' where /i, /s, • ■ • , /i are polsoiomials in k and whose coefficients can be found 
by means of the following lemmas. 

The first lemma is due to B, F. Kimball, [1, (5 3)] 

Lemma 1. (Kimball) Lei q he a real number such that n + ? > 0, and Ut 
/(x) = Then we can write A'‘/(x) tn the form 

(3) - /"’fa + in) [l + t inm, «)], 

where the valve ofW(m, n) is given hy 

(4) IF(m,n) 


JS,"(x) being a so-called Bernoulli ■polynomial of negative order which was first 
defined hy Norlund [2]. 

Lemma 2. Let the sum of all 

the set (1,2, • • • , n) he denoted by Sk(n). Then we can express it in the form 

where the coefiwients Xi(/c), hiik), • • • satisfy the recurrence relation 
(6) (k + p)\.tik) + p.Xp(/c) = X,(fc + 1) 


^ products of k different numbers taken from 


with Xo = 0, Xi = 1 and \k+i{k) = 0. 


Peoof. Clearly, among all 


n 
k + 


0 


products of Qo +- 1) numbers out of 


(1,2, • • ■ , n), there are exactly products containing the greatest factor n. 

The sum of these products is therefore n-Sk{n — 1). Repeating this reasoning, 
we get 

(7) Sk+i(ji] = n'Skin 1) -|- (w — 1)‘/Sj,(7i — 2) • • • 4- (fc fi- 1) • SkCk). 


Evidently, (5) is true for /c = 1, Suppose now that it is true for k = k. Then 
the right-hand side of (7) can be written as 

M-O P-1 \ k -T p J 

. E [fa+p +1)(»+ '+J) - p (4^;,)] 

= § i-ir^^'[{k +p)x,_i(fc) -f +t+1)' 

The lemma thus follows by induction on k. 



ASYMPTOTIC ISjqPANSION 


276 


Tho niiinber Sk{i^) niEiy be called a Stirling number of the first kind. By the 
lemma just proved, it is easy to find 

aw = 3(” + ^)-(” + ') 

(8) Sin) = 1“3(” s j + + + 

&W = 946(” + + «) + 490 (" + 

.9e(«+3) + (» + l). 

We shall see that in order to compute the coefficients of fi{k), • ■ • , it is 
sufficient to compute the values of W(m, n), Xi(m), Xs(m), ■■■ ,(m = 1,2, ■■■, t). 
lei fix) = a;"’''*. Then by lemma 1, we have 

»'s„« -[^.*++£( 2ty(’".«)]- 

From the definition of Skin) it is easily seen that 
(«. + /c)(n + fc — 1) • ■ • (n + 1) = w” + ^Si{h) + • • • + nSk-iik) + SkQc) 
Hence we may write 


1 r d” , t (. . S^ik) m) Skik)\ 

It is clear from Kimball’s paper [1] that 


Wim,n) +0(n"‘~'). 


Substitutmg, we obtain 

= III [’+£ (L) ")+ 


(9) 


1+ i, ^+ o(7i“‘"') 


m -1 vr 


= ra [’ +.?, (L) "> + 

. r 1 + V V f ^ + 0(ri-‘-'ll 



276 


L. C. Ilhll 


The last expression shows that the asymptotic expansion (2) can be obtained 
by computing the numbers Xp(m), W(m, n) with 1 < p < vi < t. For example, 
consider the case i = 3 and notice that [1, (2 13)] 



and that Xi = 1, h{2) = 3, Xj(3) = 10, Xii(3) = 15. Then by a straightforward 
calculation of the right-hand side of (9) and by comparison with (2), we find 


m = J(2fc^ -h k) 


(10) J2ik) = (4fc* - e - 3fc) 

= A (40/c* - 607/ - 27/ - 637/ + 1337/ - 487c). 
Finally, combining (2) with the well-loiown Stirling’s formula [3] 


(11) n\ = VSiT'/il- 


1 + .i, + _1_J.^ + OirT^) 

^12?r ,288n» 51S40n“ ^ 


and noting (1), we obtain 

(13,1) +0(iO] 

where gi(7c), gi{k), gi{k) are polynomials m k, viz. 


giik) = 4 (8^' + + !)■ 

!72(/0 = (647/ - 40/c -f 1) 

( 12 . 2 ) 

9 a (7c) = 25C)07c“ - 384071;® -h 8327c^ - 4032/c’ 

' 51840 

-1- 83927c“ - 37327c - 139). 

The asymptotic formula of A"0'‘'®'*’ just derived is much better than a result 
previously obtamed [4]. Moreover, it may be noted that the asymptotic ex¬ 
pansion of h may be made as sharp as desired, since in fact, for any pre¬ 
scribed t > 1, Xp(m) and — (1 < m < f), may be easily computed by 

(6) and Kimball’s [1, (2 12)] respectively 

RKFEKENGES 

fl’ B F Ktvbill, "The application of Bernoulli polynomials of negative order to dif- 
fcrcnniig." Arner Math Jour., Vol 65 (1933), pp 399-416 
^2] N b "Noiimim) Differewenrechnang, Julius Springer, 1924, p 138. 



AN INEQTJAMTT EOE KURTOSIS 


277 


[3] E T Whittakeh and G. N Watson, Modern Analysis, Cambridge Univ Press, p 253. 

[4] L. C Hsn, “Some combinatorial formulas with applications to mathematical expecta¬ 

tions and to differences of zero,” Annals of Math. Stat., Vol. 15 (1944) 


AN INEQUALITY FOR KURTOSIS 

By Louis Guttman 
Cornell University 

1. Summary. It is well known that, if the fourth moment about the mean 
of a frequency distribution equals the square of the variance, then the frequencies 
aie piled up at exactly two points, namely, tbe two points that are one standard 
deviation away from the mean In this paper is developed a general inequality 
which describes the piling up of frequency around these two points for the case 
where the fourth moment exceeds the square of Jjhe variance. In a sense, it is 
.shown how “U-shaped” a distribution must be according to its second and fourth 
moments. 

2. An. inequality. Let a; be a nuidom variable whose distribution has the 
following moments: 

•(1) M = E[x) i <r' = E{x - y)\ {d + I).!' = E{x - y)\ 

a is non-negative for any distribution, and its positive square root will be denoted 
by a. Let 

(2) t = {x - y)/(T 

It will be shown that, if X is an arbitrary positive number, then 

(3) Prob jl - Xa ^ S 1 + Xa} > 1 - X“'. 

If X is chosen so as to make the left member in the braces positive, then f is 
bounded away from zeio, and (3) becomes; 

(4) Prob (Vi — Xa ^ I f I 5 a/I + Xa} > 1 — X (Xa < 1). 

For example, if a = .5 and X = V^, then (4) shows that the probability is 
■greater than .50 that I is either between .54 and 1 30, or between — 1 30 and — .54, 
If a = 2 and X = 3 , then (4) shows that the piobability is greater than ,88 that 
t is either between 63 and 1.27, or between —1.27 and —.63. In general, the 
Bipaller a is, the greater the probability that i is in a small interval around 4-1 or 
-1. In particular, if oi = 0, then X may be taken arbitrarily large, so that (4) 
shows that the probability is unity that t = ±1; this is the weU known case 
referred tqiabove. 



278 


LOUIS GUTTMAN 


3. Derivation. Inequality (3) is a special case of a slightly more general 
inequality which follows very simply from that of Tchebychef. Consider the 
function — 1 + c, where c is an arbitrary real number. By using (1) and (2), 
it is seen that 

(5) E{^ - 1 + c)“ = + c\ 

Then, according to Tchebychef’s inequality, if X is an arbitrary positive number, 

(6) Prob [{e - l'+ ^ + c')} > 1 - X~*, 

or, 

(7) Prob {1 - c - X\/a’ + ^ ^ 1 - c + X\/^M^} > 1 - X"’. 

T h is is the general inequality that was to be shown. 

Inequality (3) is obtained by setting c = 0 in (7). 

Another special case is obtained by determining c so as to maximize the left 
member in the braces of (7) By differentiation, the maximizing value is found 
to be c = —a/\/\^ — 1, for which (7) becomes: 

(8) Prob {1 - 0(0 ^ S 1 + a{^ + 2)/0l > 1 - 1/(0* + 1), 

where 0 is used instead of the notation Vx® - 1. and denotes any positive num¬ 
ber. For the same probability on the right, (8) has the advantage over (3) of 
having 1 — a0 greater than 1 — Xa, so that the former may be positive even 
though the latter is negative. Inequality (8) starts the positive interval for t as 
close to +1 as possible. On the hand, (3) provides the minimum size interval 
for t* from among all values of c that make the left member in the braces of (7) 
positive. 

If it is desired to have the positive interval for t end as close to +1 as possible, 
then the right member in the braces of (7) is to be minimized. By differentia¬ 
tion, the minimizmg value is found to be c = a/Vx^ — 1, and the minimum in¬ 
equality is: 

(9) Prob (1 - «(0* -f 2)/0 g t* 5 1 -h a0l > 1 - 1/(0* + ]). 

4. Distribution Around u- If the left member in the braces of (7) is negative, 
then instead of giving information about the piling up of probability of t around 
-j-l and — 1, (7) provides a statement about the probability in an interval around 
li. Alternatively, this may be regarded as a confidence interval for /i. The 
minimum interval is given by (9); actually, it holds regardless of the value of the 
left member in the braces, another way of stating it is: 


( 10 ) 


Prob { -Vl + a0 g i g Vl + a^} > 1 - 1/(0* + 1). 



TABLE FOR ESTIMATING GOODNESS OF FIT 


279 


TABLE FOR ESTIMATING THE GOODNESS OF FIT OF EMPIRICAL 

DISTRIBUTIONS 

By N. Smienov 

1. Editorial Note. The table presented on pp 280-281 was originally pub¬ 
lished in [1], It gives values of 

Lii) = 1 - 

|)Hal 

which is also derived in [2], 

Let (Xi, • • ■, In] be a sample of independent variables with the same con¬ 
tinuous cumulative distribution function Fix), and let N{z) be the number of 
Xk which are < z By cmpiiical distribution is meant the step-function 
‘FIW = N{z)/n, The maximum D„ of the difference |K( 2 ) - F( 2 ) | is a 
random variable and i( 2 ) is the limiting cumulative distribution function of 
. If D,„,n is the maximum of the dierence | F!,(z) - FTIz) | between 
the empirical distributions of two independent samples of sizes n and n, respec¬ 
tively, then L{&) is also the limitmg cumulative distribution function of 
(mlim -f n)f\^. 


REFERENCES 

I 

[1] N, Smirnov, “On the estimation of the discrepancy between empirical curves of dis¬ 

tribution for two independent samples," Bullelm MalMmtique k VVnmmi^ 
de Mo&m, Vol, 2 (1939), fasc, 2, 

[2] W, Feller, “On the Kolmogorov-Smirnov limit theorems for empirical distributions,” 

Annals of Math, StaL, Vol. 19 (1948), pp 177-189, 



280 


N. SMIRNOV 


TABLE of Lie) || 

TABLE of Liz)- 


TABLE of L(z)- 

— 

Z 

L{z) 

Continued 


Continued 




z 

Lh) 

1 z 

Liz) 

9.ft 

non nni 







.29 

.000 004 

.69 

.272 

189 

1.09 

.814 

342 

.30 

.000 009 

70 

.288 

765 

1.10 

.822 

282 

.31 

.000 021 

.71 

.305 

471 

1.11 

.829 

950 

.32 

.000 046 

.72 

.322 

265 

1.12 

.837 

356 

.33 

000 091 

.73 

.339 

113 

1.13 

.844 

502 

.34 

.000 171 

.74 

.355 

981 

1.14 

.851 

394 

.35 

.000 303 

.75 

.372 

833 

1 15 

.858 

038 

.36 

000 511 

76 

.389 

640 

1.16 

.864 

442 

.37 

.000 826 

.77 

.406 

372 

1.17 

.870 

612 

.38 

.001 285 

.78 

.423 

002 

1.18 

.876 

548 

39 

.001 929 

.79 

.439 

505 

1.19 

.882 

258 

.40 

.002 808 

.80 

.455 

857 

1.20 

.887 

750 

.41 

.003 972 

.81 

472 

041 

1.21 

.893 

030 

.42 

.005 476 

.82 

.488 

030 

1.22 

.898 

104 

.43 ' 

.007 377 

83 

.503 

808 

1.23 

.902 

972 

.44 

.009 730 

.84 

.519 

366 

1.24 

.907 

648 

.45 

.012 590 

.85 

.534 

682 

1.25 

912 

132 

.46 

.016 005 

.86 

.549 

744 

1.26 

.916 

432 

.47 

.020 022 

.87 

.564 

546 

1.27 

.920 

556 

.48 

.024 682 

.88 

.579 

070 

1.28 

,924 

505 

.49 

.030 017 

.89 

593 

316 

1.29 

.928 

288 

.50 

,036 055 

.90 

.607 

270 

1.30 

.931 

908 

.51 

.042 814 

.91 

.620 

928 

1.31 

.935 

370 

.52 

.050 306 

.92 

.634 

286 

1.32 

.938 

682 

.63 

058 534 

93 

.647 

338 

1.33 

,941 

848 

.54 

.007 497 

.94 

.660 

082 

1.34 

.944 

872 

.56 

.077 183 

.95 

.672 

516 

1.35 

.947 

756 

56 

.087 677 

.96 

.684 

636 

1.36 

.950 

512 

.57 

.098 656 

.97 

.696 

444 

1.37 

,953 

142 

.58 

.110 395 

.98 

.707 

940 

1.38 

.955 

650 

59 

.122 760 

.99 

.719 

126 

1.39 

.958 

040 

.60 

.135 718 

1.00 

.730 

000 

1.40 

.960 

318 

.61 

, 149 229 

1.01 

740 

566 

1,41 

.962 

486 

.62 

.163 225 

1.02 

.750 

826 

1.42 

.964 

552 

63 

.177 753 

1.03 

.760 

780 

1.43 

966 

516 

.64 

192 677 

1.04 

.770 

434 

1.44 

.968 

382 

.65 

.207 987 

1.05 

.779 

794 

1.45 

.970 

158 

66 

.223 637 

1 .06 

.788 

860 

1 46 

.971 

846 

67 

.239 582 

1.07 

.797 

636 

1.47 

.973 

448 

.68 

255 780 

1.08 

.806 

128 

1.48 

.974 

970 



TABLE FOR estimating GOODNESS OP PIT 


281 


TABLE of L{z)- 
Continued 


TABLE of L{z)- 
Gontinued 

— 

TABLE of L(z)- 
Concluded 


1 

z 

L{z) 

z 

L(z) 

z 


1.49 

.976 

412 

1.89 

.998 

421 

2.29 

.999 

944 

1.50 

.977 

782 

1.90 

.998 

536 

2.30 

.999 

949 

1.51 

.979 

080 

1.91 

.998 

644 

2.31 

.999 

964 

1.52 

.980 

310 

1.92 

.998 

744 

2.32 

.999 

958 

1.63 

.981 

476 

1.93 

.998 

837 

2.33 

.999 

962 

1.54 

.982 

578 

1.94 

.998 

924 

2.34 

.999 

965 

1.55 

.983 

622 

1.95 

.999 

004 

2.35 

.999 

968 

1.56 

.984 

610 

1 96 

.999 

079 

2.36 

.999 

970 

1.57 

.985 

544 

1.97 

.999 

149 

2.37 

.999 

973 

1 58 

.986 

426 

1.98 

.999 

213 

2.38 

.999 

976 

1.69 

.987 

260 

1.99 

.999 

273 

2.39 

.999 

978 

1.60 

.988 

048 

2.00 

.999 

329 

2.40 

.999 

980 

1.61 

.988 

791 

2.01 

.999 

380 

2.41 

.999 

982 

1.62 

.989 

492 

2.02 

.999 

428 

2.42 

.999 

984 

1.63 

.990 

154 

2.03 

.999 

474 

2.43 

.999 

986 

1.64 

.990 

777 

2.04 

.999 

516 

2.44 

.999 

987 

1.65 

991 

364 

2.05 

.999 

552 

2.45 

.999 

988 

1.66 

991 

917 

2.06 

.999 

588 

2.46 

.999 

989 

1 67 

.992 

438 

2.07 

.999 

620 

2.47 

.999 

990 

1.68 

.992 

928 

2.08 

.999 

650 

2 48 

.999 

991 

1.69 

.993 

389 

2.09 

.999 

680 

2.49 

.999 

992 

1.70 

.993 

823 

2.10 

.999 

706 

2.50 

.999 

9925 

1.71 

994 

230 

2.11 

.999 

728 

2 55 

.999 

9956 

1.72 

.994 

612 

2.12 

.999 

750 

2 60 

.999 

9974 

1.73 

.994 

972 

2.13 

999 

770 

2.65 

.999 

9984 

1.74 

.995 

309 

2.14 

.999 

790 

2.70 

.999 

9990 

1.75 

.995 

625 

2.15 

999 

806 

2.75 

.999 

9994 

1.76 

995 

922 

2.16 

.999 

822 

2 80 

.999 

9997 

1.77 

.996 

200 

2.17 

999 

838 

2.85 

.999 

99982 

1 78 

.996 

460 

2.18 

999 

852 

2.90 

.999 

99990 

1.79 

.996 

704 

2.19 

.999 

864 

2.95 

.999 

99994 

1 80 

.996 

932 

2.20 

.999 

874 

3.00 

999 

99997 

1.81 

997 

146 

2.21 

.999 

886 




1.82 

.997 

346 

2.22 

.999 

896 




1.83 

997 

533 

2.23 

.999 

904 




1.84 

.997 

707 

2.24 

.999 

912 




1.85 

.997 

870 

2.25 

.999 

920 




1.86 

.998 

023 

2.26 

.999 

926 




1.87 

.998 

145 

2'. 27 

.999 

934 




1.88 

.998 

297 

2.28 

.999 

940 


1 





BOOK REVIEW 

Fundamentals of Statistics Truman Lee Kelley. Harvard University Press, 

1947; pp. xvi, 765. $10.00. 

. Reviewed by A. M. Mood 

Iowa Slate College 

First, a brief look at the contents: introductory matter, broad classifications 
of types of data, quantitative and qualitative aspects of data, construction of 
tables, charts, and graphs—200 pages; location and scale parameters, and 
moments—76 pages; normal distribution—^30 pages; exact sampling distributions 
based on normal theory—5 pages; binomial distribution, goodness of fit tests, 
contingency tables, normal approximation to the distribution of the variance 
ratio, properties of Chi-square—20 pages; correlation and regression—160 pages. 

These first 480 pages constitute the essential part of the book and the part that 
will be commented on here. But there are 270 more pages, the content of which 
we shall merely note without comment. There is a chapter of 90 pages entitled 
“Sundry Statistical Issues and Procedures” which discusses fifteen issues such as 
periodicity, time senes, curve fitting, variance error of a coefificient corrected for 
attenuation, machine extraction of square roots, and sequential analysis. There 
follows a chapter of 40 pages devoted to no less than twenty-three topics in 
mathematics, topics such as: matrices and determinants, the square root trans¬ 
formation, expanding a table, spaces of three or more dimensions, and Fourier 
series. The remaining 140 pages contain numerical tables, references, various 
indexes, and a test designed to measure the adequacy of students’ mathematical 
preparation. 

This then is anolher book which deals with the descriptive aspects of statistics. 
Despite iis lillo, u omits discussion of distribution theory, sampling theory, 
the theory of esLimation, tests of hypotheses, or the theory of probability. The 
phrase “confidence interval” appears not once, I believe, in the entire 760 pages. 
The discussion of Student’s distiibiition is brief enough to be quoted in its 
entirety (page 284): “The t-distribution, shown through the courtesy of Dr. 
Philip J. Rulon, in Chart VIIIII, IS appropiiate for interpreting the significance 
of means, differences of means, and of regression coefficients, for small samples— 
say N less than 15. It is the distribution of these statistics computed from small 
samples drawn from a parent normal distribution ” 

Thus the author denies any value to the developments m the fundamentals of 
statistics during the past twenty-five or thirty years. Pie does this not merely 
by implication but in so many words, referring to modern statistical inference, 
he writes (page 13): “A still greater weakness is that it is essentially a deductive 
procedure and relatively sterile in suggesting new courses—inspiring creative 
inferences. It is fundamentally a method of proof and not one of invention ” 

282 



BOOK KEVIEW 


283 


He is therefore fully aware of his extreme position, and takes great pains to 
justify it. His thesis is that the main purpose of statistics is to suggest new 
hypotheses to the scientist. In developing this thesis he writes (page 15): “The 
physicist observes seemingly irregular changes in a: as changes. He repeats 
his experiment, controlling more and more of the conditions, and repeats again 
and again, and, if successful, he reaches a law at the end of his work. He has 
been using statistics.” But his discussion avoids certain relevant questions. 
Why does the physicist repeat the experiment? Why did he perform it in the 
first place? Hid he suspect before he collected any data that x and y might be 
related? 

At any rate, the opinion of most present-day statisticians is that the primary 
role of statistics in scientific research is statistical inference. This opinion is 
certainly well-founded in my own experience. Here at Iowa State College the 
Statistical Laboratory is mtimately implicated in the research programs of all 
departments—physical, biological, and social. These scientists perform their 
experiments with a specific purpose in mind—usually the estimation of some 
parameters, sometimes the testing of a hypothesis. They never seem to seek 
in a collection of data some new hypothesis by artful selection between the mean, 
the mode, the geometric mean, the harmonic mean, and the median 

It must be reported that, even as a book on descriptive statistics, it leaves 
much to be desired. The errors usually found in such books are to be found here 
as well as many more. There is the long discussion of skewness and kurtosis 
baaed on the false notion that moments are determined by the nature of the 
distribution in the neighborhood of the mean. Certain properties of the normal 
distribution are Imputed to all distributions. Erroneous criteria for selecting 
amongst the many means are given. The universality of the normal distribution 
seems exaggerated; thus, for example, referring to deviations from regressions 
(page 364): “Since the quantities (xo — So) are ‘errors’ we may regularly assume 
them to be normally distributed.” Population parameters and their estimates 
are confused. The book contains a great many statements (like the final one in 
the section on the Student distribution quoted above) which are so carelessly 
written that they have to be counted as errors. Several of the derivations and 
arguments are also carelessly constructed, an extreme example of this appears on 
page 206: “Is the mean an unbiased statistic? M = {xa + xt Xc 
Xn)IN. Since the various a’s are independent, there are just N degrees of freedom 
and M is unbiased.” 

Students will likely have difficulty with this book. There is an air of arti¬ 
ficiality because of the omission of any discussion of population distributions and 
the notion of random sampling. Without any background of this kind it is 
hard to motivate the presentation, and the various topics become isolated. 
Moments are defined in terms of sample observations, and population moments 
are defined merely as the limits of these moments as the sample size becomes 
infinite. To introduce the mean, the author writes essentially: let us consider 
the function/(b) = [T^x^/NY^'’. There is no pointing to the middle of a distribu- 



BOOK BSVIKW 


tion function, or even a sample, or a histogram. The variance is introduced the 
same way; one considers the function X | *, - 1, l*/(f ■-1), TechniOal terns 
are used Wthout definition, for example, in the passage about the mean quoted 
above, the student suddenly encounters le word "unbiased" without definition 
or previous discussion and must infer its meaning from the context. 

Perhaps the best part of the book are three chapters on correlation and regres¬ 
sion. Theideacfcorrelationishereinkodncedwiththelscussionofanumerical 
example, and several other topics are discussed in terms of examples, This part 
of the hook is very exhaustive; every sort of coirelation coefficient is discussed 
as is eveiy sort of correction to such coefficients. But still the miting is careless, 
and there is some confusion of ideas The worst confusion occurs because the 
distinction between noimal and intraclass comlation is never brought out; the 
discussion hops bank and forth between tie two ideas with no hint that they 
are not the same thing, This part of the book, too, is m the style of statistics of 
thirty years ago; the emphasis is on correlation coefficients rather than regression 
coefficients. 




NEWS AND NOTICES 


RcddcTu (lie invited in nidwiit to the tiCCivXoAy of iho IfisitluiB W6ws liB'ius of mtorest 

Personal Items 

Di. Leo A. Aroian of Himtei ('ollego has been promoted to an assistant 
professorship. 

Mr. Carl A. Bennett is now with the General Electric Co , Hanford Engineering 
Project, Richland, Washington, as an engineer in the Statistical Division. 

Dr. Arthur B. Brown has been promoted from an Assistant Professor to an 
Associate Professor of Mathematics at Queens College, Flushing, New York. 

Professor Maurice H. Belz has returned to the University of Melbourne, 
Carlton, Australia after having spent six months in the United vStates. 

Dr. Edivard E. Cureton, member of Richardson, Bellows, Henry & Co , Inc., 
industrial psychologists, is now at the United States Naval Air Station, Pensa¬ 
cola, Florida working on a project with the Navjj, The object of this project is 
to improve ground school traming, especially instructor training, in the Naval 
Air Training (iominand. 

Mr. Eric F. Gardner has accepted an assistant professorship at the School of 
Education, Syracuse University, Syracuse, New York. 

Mr. Leo S. Gunlogaon, formerly with the Lumbermens Mutual Casualty Co. 
at Chicago, is now witli the Market,ing Services Division, Carrier Corporation, 
Syracuse, New York. 

Dr. Theodore E. Harris has accepted a position with the Douglas Aircraft Co , 
Santa Monica, California. 

Dr. Manuel 0. Hizon, a former graduate student in the Mathematics Depart¬ 
ment, University of Michigan, is now with the Bureau of Banking, Manila, 
Philippines as Actuary-Examiner 

Mr. Julius Lieblein, formerly in the Ticasury Department, Washington, D C., 
lias tiansferred to the Statistical Engineering Laboratory, National Bureau of 
Standards, where he is working on piobleras in acceptance sampling and process 
control. 

Mr. Jack Moshman, formerly a tutor of mathematics at Queens College, 
Flushing, New York, has been appointed to the staff of the Department of 
Mathematics, University of Tennessee. 

■ Dr. Horace W. Norton, formerly with the U S. Weathei Bureau, Washington, 
D. C as meteorologist, is now at Oak Ridge, Tennessee. His position there is 
to study the application of statistics to reliability of weighings and analyses in 
connection with accountability lor source and fissionable materials. 

Mr. Emil D. Schell of the Bureau of Labor Statistics has been appointed Chief 
of the Mathematics and Electronic Computer Branch in the Office of the Comp¬ 
troller, Umted States An' Forces. 

Miss Bernice Scherl, foi'merly with the Schenley Research Institute, Inc., New 
York, has accepted a position as Statistician, Shell Oil Co., New Y'ork 

285 



286 


NEWS AND NOTICES 


Dr. Irving E Segal, who has been an assistant at the Institute for Advanced 
Study at Princeton, New Jersey, has accepted an assistant professorship in the 
Mathematics Department, University of Chicago. 

Miss Rosedith Sitgreaves, assistant statistician in the United States Public 
Health Service, has returned to her position in Washington after doing advanced 
study at Columbia University. 

Dr. John E. Walsh, who received his doctor’s degree in mathematics from 
Princeton University last October, is now employed by Douglas Aircraft Co., 
Inc. of Santa Monica, California. 

Mr. Winfred P. Wilson, a former graduate student at the University of Michi¬ 
gan, has accepted an assistant professorship at the University of Houston, 
Houston, Texas in the Department of Mathematics. 


Announcement has been received of a new journal, The British Journal of 
Psychology, Statistical Section, whieh is published by the Council of the British 
Psychological Society. The qflitors are Professor Sir Cyril Burt and Professor 
Godfrey Thomson. The first issue has been published and later issues will be 
published as material warrants. Subscriptions and inquiries should be sent to 
the University of London Press, Ltd., Warwick Square, London, E. C. 4. 


Announcement of Navy Department Joint Board of U. S. Civil Service 

Examiners 

Implementing its scientific research and development program both geo¬ 
graphically and in new fields of endeavor, the Navy Department is currently 
expanding three comparatively new, permanent laboratories in California. 
Heretofore, the Navy Department’s scientific centers have been concentrated in 
the eastern and eastern seaboard areas. 

Two of the laboratories have been established as the logical outgrowth of 
programs carried on by universities during the war. The Naval Ordnance Test 
Station, China Lake (formerly Inyokem), California, 160 miles from Los Angeles, 
was originally an activity of the California Institute of Technology. Its present 
program involves research, development and test work with ordnance equipment 
and explosives. The Navy Electronics Laboratory at San Diego, California is 
the outgiowtli of work done by the University of California. It is concerned 
with research,* testing and development of electronic control devices, detection 
equipment, instrumentation equipment and training aids. The Naval Ah’ 
Missile Test Center, at Point Mugu on the coast of Oaliforma, 60 miles north of 
Los Angeles, was established when the need for an installation became apparent 
as the resuH, of the Navj' Department’s activities on guided missiles. The Test 
Center's aenvitiLS arc concerned vcii.h ihghl and laboiuLoi'y testing and evaluation 
of guided missiles and rboir component^. 

Lacli of the eslablishments has current, need for qualified personnel m a variety 
of scientific fields to slati its laboratories Roecutly completed at the Naval 



NEW8 AND NOTICES 


287 


Ordnance Test Station is Michclson Laboratory at a cost of $6,000,000. Many 
more millions of dollars have been spent in equipment and facilities. Additional 
construction and facilities are, planned for both the Air Missile Test Center and 
the Electronics Laboratory 

The work programs of the lalioratories aic planned, directed and accomplished 
under the direction of an outstanding staff of civilian scientists. Extensive use 
is made of the council method of operation. Constant liasion is maintained with 
other research organizations, universities, scientific associations, and outstanding 
authorities throughout the nation. 

Professional positions arc in the career service of the Federal government under 
Civil Service laws. Examinations are now open in the three scientific establish¬ 
ments in the following professional fields: Chemist, Mathematician, Metallurgist, 
Meteorologist, Physicist, Statistician, Scientific Research Administrator and 
Scientific Staff Assistant. 

Examinations are also open in the following branches of the Engineering 
profession: Aeronautical, Chemical, Civil, Electrical, Electronics, General, 
Industrial, Material, Mechanical, Metallurgical, Ordnance, Safety and Stmctural. 

Salaries for most of the positions range from $3397 to $9975 per annum 
Salaries are predicated on the level of ability, knowledge and experience required 
to effectively discharge the duties of a specific position 

Further information may be obtained from the Navy Department Joint Board 
of U. S. Civil Service Examiners, 1030 East Green Street, Pasadena 1, California. 


Reorganization of Philosophy of Science Association 

The Philosophy of Science Association has been reorganized with Philipp Fiank 
of Harvard University as President; C West Churchman of Wayne University, 
Detroit, as Secretary-Treasurer, 

The following are members of the Govermng Committee: Gustav Bergmaun, 
State University of Iowa, Thomas A. Cowan, Wayne University; Clyde Kluck- 
hohn, Haiward University; Sebastian Littauer, Columbia University; F. S C. 
Northrop, Yale Yniversity. 

The official journal of the Association is the Philosophy of Scimco of which 
Professor C. West Churcliman is Acting Editor. Manuscripts should be sent to 
the Acting Editor. 

Applications for membership may be sent to the Secretary-Treasurer. Dues 
are $5.00 a year. 

The Association encourages the establishment of local groups in the philosophy 
of science. 


Columbia University Conference on Industrial Experiro.entation. 

The School of Engineering of Columbia University in the City of New York 

announees an Intersession Five-day Intensive Training Conference on Industrial 

0 



288 


NEWS A.ND NOTICES 


Experimenlation to be offered September 14-18,1948 by the Department of In¬ 
dustrial Engineering in cooperation with the Department of Mathematical Sta¬ 
tistics of the Graduate Faculty of Political Science. 

The lecturing will be shared by Profes.sors S. B. Littauer and .1, Wolfwitz and 
a staff of special lecturers drawn from industiy. 

A descriptive brochure will be ready for mailing in the latter part of July. 
For further details, interested persons may communicate directly with Professor 
S. B. Littauer, Dcpartmonl. of Industrial Engineering, Columbia University, New 
York 27, New York 


New Members 

Thefollowing persona have been decied to membership in the InsMute 
(December 1, 1947 to Pebruary 28, 1948) 

Angulo, Walter J., B E. (Johns Hopkins Univ.) Graduate student at Johns Hopldns Uni¬ 
versity, SSlSS Beaufort Ave , Battimore IB, Maryland. 

Beard, Helen P., Ph D. (Maas Institute of Tech.) Assistant Professor of hlathematios, 
Newoomb College, New Orleans 18, Louisiana. 

Blomquist, Nils G., (Univ. of Stocldiolm) Statistician, Sverige Reinsurance Company, 
Aladdinsvagen 47, Sinedslatten, Sweden. 

Bodwell, Charles A., M.S (Univ of Michigan) Graduate student at the University of 
Michigan, Box 773, West Ixidga, Ypsilaiiti, Michigan. 

Burnett, Jean, M.S. (Mich. State (Jollcge) Instructor in Mathematics, Michigan State 
College, 702 Cherry Lane, East Lansing, Michigan. 

Burton, Robert E., Student at Michigan University, 7239 Atkinson Avenw, Detroit 2, 
Michigan 

Byrd, Paul F., MS. (Univ of Chicago) Meteorologist, Q.S.A.F , Weather Detachment, 
Lookbourne Air Base, Columbus 17, Ohio 

Cemuschi, Felix, Ph.D (Univ. of Cambridge) Professor at the University of Montevideo, 
Asociaoion Uruguaya de Estadistioa, Av. Agraoiada 1464, Montevideo, Uruguay. 

Connor, William S., Jr., M A (Univ. of North Carolina) Associate Professor of Economic.s, 
Univeisity of Kentucky, College of Commerce, Lexington, Kentucky 

Dalenius, Tore, Pil kand. Hastholmsvagen 16, Stockholm, Sweden. 

Davis, Roderic C., M S fCalif Institute of Tech.) P-6 Mathematician, Head of Assessment 
Section, P 0. Box N-467, N O.T.S., Inyokern, California. 

Dvoretzky, Aryeh, Ph.D. (Hebrew Univ , Jerusalem) Research Fellow at Hebrew Uni¬ 
versity, Jerusalem, % American Friends Hebrew University, B East 89th St., New York. 

Gardner, Robert S., M S (Tulane Univ ) Instructor in Statistics, Mathematics Department 
at Ohio Slate University, BIS W. Eleventh Avenue, Columbus, Ohio. 

Goins, Mary, M S. (Univ of Mich.) Assistant Professor of Mathematics, Marshall College, 
IjSB Ninth Avenue, Huntington, West Yirgmia. 

Hratz, Joseph A., B.A. (St. Ambrose College) Insliuotor of Mathematics, St. Ambrose 
College, Davenport, Iowa. 

Kelly, Harriet J., PhD (Univ. of Iowa) Research Associate, Head of Statistical Dept. 
Children’s Fund of Michiguu, 000 Fiedcrick, Del!oil 2, Michigan 

Kincaid, Wilfred M., Pli D I'Uiown Lhii\ ) Tii'-UupLor in Mathematics, University of 
Mirhigan. Ami -V'joi, Michigan 

Kish, Leslie, IIS. '('Dllege ol 1 lie f'lCvof N Y.) .'^'■iior Sampling Statistician, 707 Ml Pleas¬ 
ant, Ann Aihoi, Michiiini, 



N’KWH ANll XOTIfIKS 


289 


Lehr, Marguerite, Ph D (Bryn Muwr, Pa ) Associate Piorcsaor of Mathematics, Bryn iNiiiwi- 
College, C'artrcf, Bryn Mawr, Peiinsylvama. 

Loizelier, Enrique Blanco, M.A. (Madrid Uuiv.) Professor of Statistics, Faculty of Eco- 
nomioH, Madrid, Univcisity, Nervion 4, Madrid, Hpain. 

Lorenzo, Cesar M., M A, (Amciicau Uiiiv ,\Vash.,D O.) Statistician, Foodand Agriculture 
Organization of tlio United Nations, nSB De Sales Sheet, N.W , Washington 6, D. C. 

Lott, Fred W., Jr., M.A, (Univ. of Mich.) Teaching Fellow at the University of Michigan, 
1£S3 Malden CniirL, Willow [tun, Michigan. 

Mantel, Nathan, B.S, (City College of New York) Bioatatisticiaii, U. S Public Health Seiv- 
ice, Lily Pawls J)r , N.E., Washington 10, D. C. 

Marrian, Dixon M., jV.M (Columbia Univ.) Master at the Gilman Country School, Balti¬ 
more, Md., ISOfi Shadyside Road, Baltimore 18, Maryland. 

Maitms, Octavio Augusto L., M.A. (Columbia Univ.) Tecnico de Educacao, Department of 
National lUlucation, Ilio cle Janeiro, Rua Ftgueiredo de Magalhaes Aft. 100, Rio de 
Janeiro, Brazil. 

Meyer, Herbert Albert, Ph.D. (Univ. of Iowa) Associate Professor of Mathematics, 3015 
West Columbia, Qainesville, Florida. 

Nikitich, Nicholas, B.C.S. (New York Univ.) Timekeoper, SO Featherbed Lane, New York 
53, New York. 

Oakland, Gail Barker, M.A. (Umv of Mnmosota) Associate Professor of Statistics, Uni¬ 
versity of Manitoba, Winnipeg, Canada 

Olkin, Ingrain, B.S. (College of City of N Y.) Graduate Student at Columbia University, 
345 Fort Washington Aou , New York S3, New York. 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 


The thirty-third meeting of the Institute of Mathematical Statistics was held 
at Columbia University, New York City, Neiv York on Wednesday afternoon 
and Thursday, April 14 and 15,1948. The meeting was attended by 158 persons, 
including the following 78 members of the Institute: 

M. Afzal, L. A, Aroian, R. M. Auor, W D Baton, R. K. Boolihofer, J. H. Bushey, ,1. M. 
Cameron, B. H Camp, G. G Campbell, S D. Cantor, Manuel Cynamon, Tore Dalenius, 
J, F. Daly, J L. Dool), C. W. Dunnett, Aryoh Dvoretzky, Churchill Eisenhart, Benjamin 
Epatoin, M. W. Eudey, D A. Fraser, M A Geisler, Mary Goins, II. H. Goode, E J, 
Gumbel, M, H Hansen, Mina Ilaskind, L H. Ilerback, S M Ikhtiar-ul-Mulk, Seymour 
Jablon, L. F. Knudaen, Jack Laderman, Howard Levene, S, B Littauer, F. M. Lord, 
Irving Lorge, Eugene Lukaos, W. G Madow, Sophie Marcuse, Robert Miraky, E. B, 
Mode, D J Morrow, Frederick Mosteller, D. N Nanda, P M, Neurath, G. E Noether, 
M L Norden, Ingram Olkin, P S. Olmstead, A, L, O’Toole, Katharine Poaso, E J, Pit¬ 
man, W. A Reynolds, J S Rhodes, II. E. Robbins, IT G. Romig, Ernest Rubin, Herman 
Rubin, P. J. Rulon, Flank Saidel, (t. R Betli, M. A. Hchlorok, S. S- Shrikhando, Rose- 
dith Sitgreaves, Milton Sobel, Emma Spaney, F. F. Stephan, B. R. Suydam, Henry 
Teioher, J. W Tukey, A. Wald, II M. Walker, J, E. Walsh, S. S. Wilks, Dzung-shu Wei, 
Lionel Weiss, Jacob Wolfowitz, C A Wright, Mohammad Yusuf 

The Wednesday afternoon session. Professor S. B. Littauer of Columbia Uni¬ 
versity presiding, was devoted to the following two invited addresses: 

1 Incomplete Block Designs 

Prolosaor R C. Bose, Calcutta Univeisity and the University ol North Carolina 
2, Non-Parameti io Inference 

Professor J. G Pitman, University ol Tasmania and Columbia University 

The Thursday mornmg session, Professor Hobart Bushey of Plunter College 
presiding, consisted of a Symposium on Scales of Measurement at which two 
invited papers- 

1 The Development of Psychological Scaling Techniques 
Professor Harold Gulliksen, Princeton University 

2 A Generalized Model for Scales 

Professor Paul Lazarsfeld, Columbia University 

were followed by prepared discussion by Professors Phillip Rulon of Harvard 
University and .Tohn Tukey of Princeton University. 

The Thursday afternoon session, Dr. Han-y G. Romig of Bell Telephone 
Laboratories presiding, was devoted to the following contributed papers: 

1 Optimum Chaiacler of the Sequential Probability Ratio Test 

Professors Abiaham Wald and Jacob Wolfowitz, Columbia University 
2. Midli-pararneter Sequential Estimation 
Mr G R. Seth, Columbia University 


290 



EKPOET ON NEW YOEK MEETING 


291 


3. The Dislnbulion of a Definite Quadratic Form 
Professor Herbert Ilobbina, University of North Carolina 

4. The Moments and Cumulanta of the Prodvcl of S, or 4 Dependent Variables (Prelim~ 
inary Report) 

Professor Leo A Aroian, Hunter College 

5. Generalualion to N Dimensions of Inequalities of the Tchebycheff Type 
Professor Burton H. Camp, Wesleyan University 

6. On the Power Function of a Sign Test Formed by Using Subsamples 
Dr, John E. Walsh, Project Rand 

7. The Distribution of 'P“, a Mullwanale Generalization of the F-iest 
Miss Dorothy J Morrow, University of North Carolina. 

8. Approximate Confidence Points (Preliminary Report) 

Professor John Tukey, Princeton University 

At all of the sessions there was active discussion from the Boor. 

On Wednesday evening members and guests had dinner at the Men’s Faculty 
Club. 


S. B. Littauer 
Assistant Secretary 



SANKHYX 


The Indian Journal of SLatistics 


Edited by P, 0. Mahalanobis 


Vol VTII, Pait 3. 1917 


On, some analogues of the amount of infornuitioi) and Ihi'ir use 
estimation , .... .A 

in statistical 
Bhattachaiiyva 

The existence of colloctives in ahaLrael space. . 

C. F, Kopsack. 

On recursion formulae, iablos and Bessel funetion populations associated with 
the distiibulion of Classical IR-Statiatic ,. . , P K Bo&is 

On a resolvable aeries of balanced incomplete block designs 

R, C. Bose 

Notes on testing of composite hypotheses . 

,8. N, Roy 

On the general lav' of demand foi raw jule 

Miscellaneous Notes 

1' P. ClIATTUnjIiE 

Annual subaeription; 30 rupees 

Inquhies and orders may be addressed lo the 

Editor, Sankhya, Presidency College, Calcutta, India. 


ECONOMETRICA 

Journal of the Econometric Society 
Contents of Vol. 16, No. 1, January, 1948 

Pa£a 

J. Nbyman and Elizabktii L. Scoit' Consistenl' ICsi.imalos Rased on Partially 

Consistent Observations , . . 1 

Report of the Washington Meeting, September 6-18, 1947 .., ... 33 

Report of the Council for 1947, , , . , . 112 

Treasurer’s Report , , ,. ... 113 

Rules for Electing Fellows . . ,, ,118 

Election of Fellows, 1947... . .. ,, 117 

Fellows of the Econometric Society, January, 1948 ... , , 123 

Current Activities in Econometrics ... .. 125 


Published Quarterly Subscription to Noumembers; $9.00 per year 

The Econometrio Society Is an mtoruatioual Booiety for the odvoncoiAent of economic theory lo ita 
relation to atatietics and rnathematica, 

Suhsoriptione tc EcoTtometTica and inquiries about the work of the Society and the procedure in 
applying (or mcmbcrehlp ehoiild be addreesed to Allied Cowles, Secretary and Treneuxer, Tho BJoon- 
omotrio Society, The University of CUioogo, Chicago 37, Illinois, U-SA, 





l&'^ •?- f 4 t if^ 


THE ANNALS 

OF MATHEMATICAL STATISTICS 


M. g. BARTLETC 
WII4.TAM Q. COCHRAN 
AI.U3N T. CRAIG 
G. C. GRAIO 


!F. iIxuWMAt, Jft. 
OAvin ^SMtamuL . 
'i. IT. CttHnoM 
IvVilOi^T 
BiUAIAKIlaiMfS 
PiWitS. IRrxBwt 


BDITSS BY 

S.S.WlLKS.Ntiifer 

BARAT^ CRAMSIR 
B. EiDWARDS DEmNG 
J. liJJQQB 

HAROLD HOTRLLINQ 

Vmt YRB COt^PKAA'CtOM OY 
OmmoBiuti Bissmbakt 
M. A. Gibbhxok 
PA tn.R.£Uuio» 

V FAtaO.HiUR, 

Mxsuc Kao 
K L, liaBMAinr 
WicUi.it O. Madoit 


J. NHYMAN 

WALTHIR A. SHNWHART 
JOHN W. TDKBr 
A. WALD 


H. B. Mum 
AumUmaftlil. Mooi> 
FBWMnacK MofranuABK 
K. K. RoBAniB 

RmntT ScBinntA 

JfACM» Wobrawm 



AtwAr* tW l^nSaMAXte&ti BTKVttausi ia intUisiliad qugrtedy fay ihei 
loatitttbB liff M|adtfaeDBid»aJ Royal & GhaKEnd'Avea., Battanore % 

,, M(L .. ]{t«new^ oogdeis fcnr book Qumbos and othor tvaBmess caiA- 

i^ntum^Mns isiw^d be eiisifi Atotaia op MAYHBiiATii&tb SpATian Mt. 

r ^ Avan, R^f^Ore 3) Md., or to the Seeietaiy, cd the la^-r 

Bwyer, 116 BeoldiaiiiHall, UniveiBil^ 
Ano AxiMFr jM5dfcL ' 

jb auuBng addiein wlueli are to beoonis aReotjvB for, a sP'Ven 
sbiuM on or jbefom. the ISth oC tfae 

the moiSkUt cf,.l^ 'esire. The inantha t£ imoe 'are IVtoBib,.' 

^ .AmfAiB oF MA'caPMAYioAi. wA'nwiiiiOei;'. 
to Slri^ IS'^lla, New Jerecsyv' IdaAuWfiiptta 


'tiriitt wide'^R.al^iI)Al'^dAho',o^»^o^ 

. J. . .^,. _ , ,Mhei«loi8Mitoa,)rtibilH^^ 

'';lRi[ga^y'''i3tuia^' 'ttod' d^Trasn^'a^ujld/Rd. direwii^ 'On.,-.';,'I' 

' .i:2ijl4jiis;^ojijLi4S‘'<l,#Wrt^ 'i ‘\ '< '' i'*' ‘ ‘ . il ‘ . -'ll . ’ . ' •’ ''-'V. " ''y 













A CLASS OF STATISTICS WITH ASYMPTOTICALLY NORMAL 

DISTRIBUTION' 

By Wassily Hoefyding 
Institute of Statistics, University of North Carolina 


1, Summary. Let Xi, , X„ be «. independent random vectors, 

X, = {Xlf\ • ■ , Xy^), and $(,xi, • • • , Xm) a function of m(<n) vectors av = 
A statistic of the form U = , Xa,J/n(n — 1) 

• • (n — m + 1), where the sum 2" is extended over all permutations 
(oil, • , cim) of m different integers, 1 < ci < n, is called a 17-statistic If 

Xi, ,Xn have the same (cumulative) distiibution function (dt) F{x), U is an 


unbiased estimate of the population characteristic d(F) 


r r 


, x»>) 


dF(xi) dF(x,n) 6(F) is called a regular functional of the d.f F(x). 
Certain optimal properties of f7-statistics as unbiased estimates of regular func¬ 
tionals have been established by Halmos [9] (cf Section 4) 

The variance of a f7-statistic as a function of the sample size n and of certain 
population characteristics is studied in Section 5 

It IS shown that if Zi, • , Z„ have the same distribution and'i'(xi, , x,,) 

IS independent of n, the d f. of \/n(U — 6) tends to a normal d.f. as n ^ ^ 
under the sole condition of the existence of E^^(Xi, • ■ , X^) Similar results 
hold for the joint distribution of several U-statistics (Theorems 7 1 and 7 2), 
for statistics U' which, in a certain sense, are asymptotically equivalent to U 
(Theorems 7 3 and 7.4), for certain functions of statistics U or V (Theorem 7 5) 
and, under certain additional assumptions, for the case of the X^'s having dif¬ 
ferent distiibutions (Theorems 8.1 and 8.2). Results of a similar character, 
though under different assumptions, arc contained in a recent paper by 
von Mises [18] (cf. Section 7). 

Examples of statistics of the form U or U' are the moments, Fisher’s fc-statis- 
tics, Gmi’s mean difference, and several rank correlation statistics such as Spear¬ 
man’s rank correlation and the difference sign correlation (cf. Section 9). 
Asymptotic powei functions for the non-parametne tests of independence based 
on these rank statistics are obtained. They show that these tests are not un¬ 
biased in the limit (Section 9f). The asymptotic distribution of the coefficient 
of partial difference sign correlation which has been suggested by Kendall also 
is obtained (Section 9h). 


2. Functionals of distribution functions. Let F(x) = F{x^^^ , • • ■ , be 

an r-variate d.f If to any F belonging to a subset S of the set of all d f.’s in the 
r-dimensional Euclidean space is assigned a quantity 0(F), then 0(F) is called a 


' Research under a contract with the Office of Naval Research for development of multi- 
vaiiate statistical theory 


293 



294 


WASSILY HOEFFDING 


functional of F, defined on 3). In this paper the word functional will always 
mean functional of a d f. 

An infinite population may be considered as completely determined by its 
d f, and any numerical characteristic of an infinite population with d.f. F that 
IS used m statistics is a functional of F. A finite population, or sample, of size n 
IS determined by its d,f,, Six) say, and its size n. n itself is not a functional of S 
since two samples of different size may have the same d.f 

If Six^'^\ ■ , k'"’) is the d.f. of a finite population, or a sample, consisting 

of n elements 


( 21 ) 

then nS(x^^\ 


-Ml 


.•Cc = 

IS the number of elements Xa such that 


(a = 1, ,n), 


xi^’ < x^^>, • • • , .rr < .-r 


JO 


(t) 


JO 


Since ■ ■ , x'’’’) is symmetric in .-ci, • , Xn , and retains its value for a 

sample formed from the sample (2.1) by adding one or more identical samples, 
the same two properties hold true for a sample functional d(S). Most statistics 
in current use are functions of n and of functionals of the sample d f 

A random sample jXi, ■ • , Z„) is a set of n independent random vectors 

( 2 . 2 ) ,Xi% (cc = I,---,n). 

For any fixed values x^^\ • • , x^'’, the d.f. S(x^^\ ■ ■ , of a random sample 
IS a random variable. The functional &(S), where S is the d f of the random 
sample, is itself a random variable, and may be called a random functional. 

A remarkable application of the theory of functionals to functionals of d.f’s 
has been made by von Mises [18] who considers the asymptotic distributions of 
certain functionals of sample d f’s (Of also Section 7 ) 


3. Unbiased estimation and regular functionals. Consider a functional 
0 = diF) of the r-variate d.f Fix) = Fix^^\ • ■ , a;*’’’), and suppose that for some 
sample size n, 9 admits an unbiased estimate for any d.f F in 3) That is, if 
Aj, jXn are n independent random vectors with the same d f. F, there exists 
a function ipixi, ■ ■ , x^j of n vector arguments (2.1) such that the expected 
value of (piXi, , X„) is equal to fl(F), or 

(3-1) f ■■ I p(xi, • •. , x„) dFixi) ■.. dFM = 9(F) 

for every F m3) Here and in the sequel, when no integration limits are indi¬ 
cated, the integral is extended over the entire space of si, ■ ■ • , . The integral 

is understood in the sense of Stieltjes-Lebesgue. 

The estimate <pixi , • • , a:„) of 0(F) is called unbiased over 3). 

A functional 0(F) of the form (3.1) will be referred to as regular over 3).^ 

* This IS an adaptation to functionals of d f.’s of the term “regular functional" used by 
Volterra [21], 



A OlJA&fci Ul*' bTArib'iiCfcs 




Thus, the functionals regular over 3 ) are those admitting an unbiased estimate 
over 2 ) 

If eiF) is regular over 2 ), let m(<n) be the smallest sample size for which there 
exists an unbiased estimate $(xi , • ■ , x„,) of 0 over 9 ); 

(3 2 ) d(P) = J ■ ■ JHxi,-- - , Xm) dF(xi) ■ ■ dF(xJ 

for any F m 3 ). Then m will be called the degree over 3 ) of the regular func¬ 
tional 6(F). 

If the expected value of <fi(Xi, ■ ■ ■ , X„) is equal to 6(F) whenever it exists, 
tp(xi , • • • , Xn) will be called a. disiribution-frec unbiased esLimate (d-f.u.e.) of d(F). 
The degree of e(F) over the set So of di.’s F for which the right hand side of ( 3 . 1 ) 
exists will be simply termed the degree of d(F). 

A regular functional of degree 1 over 9 ) is called a linear regular functional 
over 3 ) If e(F) has the same value for all F in 9 ), d(F) may be termed a regular 
functional of degree zero over 2 ) 

Any function $(^1 , • , Xm) satisfying ( 3 . 2 ) will be referred to as a kernel of 
the regular functional 6(F) 

For any regular functional 6(F) there exists a kernel 'ho(a:i, • • • , a:,,.) symmetric 
in xi, ■ ■ ■ ,Xm ■ For if ^(a;!, , Xm) is a kernel of 6(F), 

(3 3 ) ‘JoC'Ci , ■■■ , Xm) = —, , • • • , .llaj, 

where the sum is taken over all permutations («!,•• , a,,,) of ( 1 , ■ • ■ , m), is a 
symmetric kernel of 6(F). 

If 6i(F) and 62(F) are two regular functionals of degrees TOi and m2 over 3 ), 
then the sum 0 i(F) - 1 - 02(F) and the product dj(F)62(F) arc regular functionals 
of degrees <m = Max (mi, m2) and <mi m2 , respectively, over 3 ). For if 
, ■ ■ ■ , Xmi) is a kernel of 0 i(F), (i = 1 , 2 ), then 

61(F) + 62(F) = J . . J l$i(xi , • • , Xm,) + ^2(xi , • • • , .-Umj)} 

dF(xi) ■ ■ ■ dF(x„) 

and 

6i(F)62(F) J ' ‘ J' ^ f Xm,^^^2(Xm,.\-l 3 ' 3 Xmi^rm,,) 

dF(xi) • • • dF(x3n,+7313) 

More generally, a 'polynomial in regular functionals is itself a regular functional 
Examples of linear regular functionals are the moments about the origin, 

uL ..3=1 f dF(x^^^ , 3 



296 


WASSILY HOEFFDING 


A moment about the mean is a polynomial in moments n' about 0, and hence a 
regular functional over the set % of d.f’s for which it exists (cf. Halmos [9]). 
For instance, the variance of 

— J J ~ ■'Kf’ a:™) dF(xP') dF(xi^^) 

is a regular functional of degree 2. A symmetrical kernel of a-^ is — x^‘^)^/2. 
If 3) is the set of univariate d f's with mean fi and existing second moment, 
is a linear regular functional of F over 3), since then we have 


The function 


V = 


n(n 


I W - m)‘ iffe'"). 

An 2 5 - ’>'y - rAi 2 bi" - i S 4")' 

IJ A! Vt X ct \ 7h ^ / 


is a distribution-free unbiased estimate of v . The function 




18 known to be an unbiased estimate of <r over the set of univariate normal d f's, 
but it is not a d -f, u e. 


4. 17-statistics. Let ;ci,- , be a sample of n vectors (2.1) and 

$(.ri, ■ • ■ , Xm) a function of m{<n) vector arguments. Consider the function 
of the sample, 


(4,1) U = U{xi ,■■■ ,x„) = 


n(n 1) ■■■ (n — m + 1) 


S" $(x, 


“I) 




where 2" stands for summation over all permutations (oii, • ■ , «„) of m integers 
such that 


(4 2) 1 < a, < n, a, 5 ^ a, if t j, (i, j = 1, - ■ , m) 

U is the average of the values of $ in the set of ordered subsets of m members 
of the sample (21). U is symmetric in aii, • • ,x„ . 

Any statistic of the form (4 1) will be called a U-statistic. Any function 
^(aii, ■' , ^m) satisfying (4.1) will be referred to as a kernel of the statistic U 
If $(a:i, • • , a:„) IS a kernel of a regular functional d{F) defined on a set 2), 
then U IS an unbiased estimate of d{F) over 2): 

(4,3) diF) = I ■■■ J U(xi dF(xi) ■ ■ ■ dF(x^) 

for every F in 2). 



A. CLA.SS OF STATISTICS 


297 


For n = m, U reduces to the symmetric kernel (3.3) of d{F) 

From a recent paper by Halmos [9] it follows for the case of univariate d f.’s 
(r = !)■ 

If 6{F) IS a regular functional of degree m over a set 3) containing all purely 
discontinuous d f’s, U is the only unbiased estimate over 3) which is symmetric 
in Ti, ■ • • , Xn ) and U has the least variance among all unbiased estimates 
ovei 2). 

These results and the proofs given by Halmos can easily be extended to the 
multivariate case (r > 1). 

Combining (3 3) and (4.1) we may write a 17-statistic m the form 
( 4 . 4 ) = 

where the kernel ‘fo is symmetric in its m vector arguments and the sum S' is 
extended over all subscripts a such that 


1 < «! < o !2 < • • < < n. 


Another statistic frequently used for estimating 9{F) is 6(8), where S = S{x) 
is the d.f, of the sample (2 1). If S is substituted for F in (3.2), we have 


(4.5) 


= i t Z 

tv a 1=1 




In particular, the sample moments have this form; their kernel 4 is obtained 
by the method described in section 3 
If m = 1, d{S) = U. If m = 2, 

e{s) = u + -llz \, 

n n \ n7Zi J 


and 6(8) is a linear function of ?7-statistics with coefficients depending on n. 
This IS easily seen to be true for any m. In general 6(8) is not an unbiased esti¬ 
mate of 6(F) If, however, the expected value of 6(8) exists for every F in 2), 
we have 


®!0(S)} = 6(F) + 0(0, 

and the estimate 6(8) of 6(F) may be termed unbiased in the limit over 2). 

Numerous statistics in current use have the form of, or can be expressed in 
terms of O-statistics. From what was said above about moments as regular 
functionals, it is easy to obtain (/-statistics which are d.-f. u e.’s of the moments 
about the mean of any order (cf. Halmos [9]) Fisher’s /c-statistics are (7-statis- 
tics, as follows from their definition as unbiased estimates of the cumulants, 
symmetric in the sample values Another example is Gini’s mean difference 


n(n — 1) 


H 1 


( 1 ) 



298 


WASSILY HOEPPDING 


More examples, in particular of rank correlation statistics, will be given m 
section 9. 


S. The variance of a C/-statistic. Let Xi, • 

vectors with the same d.f, F{z) = • 

(5.1) U = U{X ,, . . , Z„) = ' 


• ■ , be fi independent random 
, and let 

■■■ ,xo. 


where $(i'i, ■ ■ , Xm) is symmetric in .ri, - , Xm and S' has the same meaning 
as in (4 4) Suppose that the function $ does not involve n. 

If 6 = d{F) IS defined by (3.2), we have 


E\U] ^ E{^(X^,- ,x.n)} = e 

Let 


(5.2) •• ,j;„) = Bj4>(ri, -^.w^X^+i, ■-,X„,)), (c = 1, • • , m), 

where a:i, • , are arbitrary fixed vectors and the expected value is taken with 

respect to the random vectoi s A'c+i, • • , Xm . Then 

(5 3) ^'c-iC'ii, • • , .Tc-i) = £^{4v(.ri, • • , Xc-i , X ,)), 

and 


(5 4) 

Define 
(5 5) 

(5 6) 

We have 


E{^c{Ei 


-Yo)} = 


(c = 1, , Vl). 


'^(xi , • • • , Xm) = , • • • , Xm) “ d, 

4>c(.ri, , Xc) = 4>c('Ci, • , Xc) -9, (c = 1, • ■ • , m). 


(5 7) 'kc-i(.xi, • • , Xc-i) = E{'i'c{xi, ■ , Xc^i , Zc)}, 

(5.8) E{^„(AT , .. ■ , A,)) = EinX ,, • • , Z„0) =0, (c = 1, • ■ , m). 


Suppose that the variance of >Pc(Ai , ■■■ ,Xc) exists, and let 


(5 9) fo = 0, f, = , A,)), (c = 1, ■ • , m). 

We have 


(5 10) fc = E{^t{X^,■■■ ,A.)} - e\ 

tc = ^c{F) IS a polynomial m regular functionals of F, and hence itself a regular 
functional of F (of degree < 2in). 

If, for some parent distribution F = Fo and some integer d, we have fd(Fo) = 0, 
this means that iPd(Ai, • , Ad) = 0 with probability 1. By (5.7) and (5 9), 

fd = 0 implies = ■ • • = = o 



A CLASS OF STATISTICS 


299 


If ri(-^o) — 0, we shall say that the regular functional 6{F) is stationary^ 
for F = Fa. If 

(5 11) U^o) = ■ • = UF,) = 0, tu+iiFa) >0, (1 < if < m), 

B{F) will be called stationary of order d for F = Fa. 

If (oa , ■ • , a„,) and (^i, • • , /3m) are two sets of m different integers, 1 < a*, 
< n, and c is the number of integers common to the two sets, we have, by the 
symmetry of '1', 

(5 12) E{^iX., , . , , . X,J) = to 

If the vaiiance of U exists, it is equal to 

/ \-2 m 

where stands for summation over all subscripts such that 

1 < Q!i < o !2 < ■•< a, n < n, 1 < /3i < /32 < • ■ < < n, 

and exactly c equations 


a, = Pi 

are satisfied By (5.12), each term in is equal to fc The number of terms 
m 2^'’' IS easily seen to be 

n(n — 1) ■ (n — 2m + c + 1) _ //n — m\ (n 

cl{in — c) i(m — c)! \c / \m — c / \m 

and hence, since fo = 0, 

"When the distributions of Zi, • • • , X„ are different, F^ix) being the d.f of 
Xy , let 

(5-14) day, .,a„ = X{$(Zai , , E aj], 

(5.15) =X(4'(2'‘l,- - yXciX^y.. , X0„_J} — day. ■.cy.Sy.. 

(c = 1, ■ • , m), 


® According to the definition of the derivative of a functional (cf Volterra [21]; for 
functionals of d.f’s of von Mises [18]), the function m{m - 1) . (m — d + 1) ’S’dfxi .. xj), 
which is a functional of F, is a d-th derivative of ff(F) with respect to F at the “point” F 
of the space of d f’s 



300 


WASSILY HOEFFDING 


Tcltfii ' ' i/Sfli-f I 7 li'’'I Vw-fl 

(5,16) = • .“c)Ti. • • ■ ,Tm-c 

(Xai , • • • , XO) 

c!(m — c)!(m — c)' 


(5,17) Un = 


,^m—clTli' 0 


n(^ 1 ) ‘ ■ {n - 2m + c + 1) 
where the &um is extended over all subscripts a, /3, y such that 
1 < ai < • • < a. < n, 1 </3i < • • < Pm-c < n, 1 < Ti < ''' Tm-c < >h 

a, 7 j , ^1^1/}- 

Then the variance of V is equal to 


(518) 


■iU) 


's : 


Returning to the case of identically distributed X’s, we shall now prove some 
inequalities satisfied by Ci i ' i which are contained in the fol¬ 

lowing theorems: 

Theorem 5,1 The quaniilm fi , • • • , rm us defined hy (5,9) satisfy the in¬ 
equalities 


(619) 


Q < if 

c d 


f/ 1 < c < d < m. 


Theorem 5 2 The variance v'(17n) of a U-staiisHc U,i — UiXi, ■ , -V„), 
ivhere Xi, ■ ■ , Xn are independent and identically distributed, satisfies the in¬ 
equalities 


(5.20) 

no^{Un) is a decreasing function of n, 


-ri<vTO< -f,,.. 
n n 


(5 21) 


(ii + l)(r'“(17„4.i) < na^(Un), 


which takes on its upper hound mfn foi n = m ami lends to its lower bound 
as n increases: 


(5.22) 

(5.23) 


e^(Um) = Urn , 

lim na^iUn) = in^i. 


If E{Un] = 6(F) IS stationary of order >(l — 1 for the df of Xa , (5.20) may 
be replaced by 


’I lUm, d)U < AU.) < Knim, d)U, 


( 524 ) 



A CLASS OF STATISTICS 


301 


where 


(5.25) 


Kn{m, d) 


/n /m — iV n — m\ 
\m/ ‘^\c — lj\vi — c/ 


We postpone the proofs of Theorems 5.1 and 5.2. 

(5.13) and (5.19) imply that a necessary and sufficient condition for the 
existence of o-“(f7) is the existence of 

(5.26) U = , • • • , Z„)} - e' 

or that of E{$^(Zi, • - , Z„)) 

If > 0, cr^{U) is of order 

If d{F) is stationary of ordei dior F = Fo, that is, if (5 11) is satisfied, <r^(!7) 
is of order n~^~^ Only if, for some F = Fa, 0(F) is stationary of order m, where 
m is the degree of 6(F), we have a(U) = 0, and U is equal to a constant with 
probability 1 ^ 

For instance, if d(Fa) = 0, the functional 6^(F) is stationary for F = Fo. 
Other examples of stationary “points” of a functional will be found m section 9d 
For proving Theoiem 5.1 we shall require the following 
Lemma 51 // 


(5 27) 5<i = fd — L j ("d-i + ( 2 ) ■ ■ + (“1) (d - 1 


fi, 


we have 

(5.28) 

and 

(5 29) 


Sd > 0 , 


fii = 5d + 5d_i -h • + ' 


(d = 1, • • • , w)‘ 


Proof (5 29) follows from (5 27) by induction 
For proving (5 28) let 

n„ = e\ Vo = F{^l(X^, , X)!. (c = 1 , • • •, m). 


Then, by (5 10), 


Tc — Vc VQ } 


and on substituting this in (5.27) we have 


Sd = t (-1)""“ 


CmO 



Vc 


From (5.9) it is seen that (5.28) is true for d = 1. Suppose that (5 28) holds 
for 1, ■ • ■ , d — 1. Then (5.28) will be shown to hold for d. 



302 


WASSILY HOEFFDING 


Let 

ioCiJi) = ^i(a:i) — 0 , a)o+i) 


— ^^s+l(^l'l j ■ ■ ■ ) ^o+l) ^'c(®2 ) ■ ■ ■ ) ^c+l)) 

(c = 1, • 

■■ ,d- 

1 ). 

For an arbitrary fixed Xi , let 




??c(a:i) = F{fo(a’i > N’c^.i)}, 

(c = 0, • 

■■ ,d- 

1 ). 

Then, by induction hypothesis. 




5d-i(:r0 =i:(-i)'^-"‘’f^7 

1 ficixi) > 0 




for any fixed aii. 

Now, 

J^lnc(Xi)} = i}e+l ~ nc, 


and hence 
®{5^i(Xi)} =E(-i) 

< 3=0 


d-l-ofd 

c 


0 


(r/«+i - ’’li) = S (~1) 


d—tf 


c=0 




The proof of Lemma 5 1 is complete. 

Proof of Theorem 5 1. By (5 29) we have for c < d 


Cfd 




(5.30) 


= E 




5a + C £ ( ) 8 a . 

o=c+l \^/ 


From (5 28), and since > 0 if 1 < a < c < d, it follows that each 

term in the two sums of (5 30) is not negative. This, in connection with (5.9) 
proves Theorem 5 1. 

Proof of Theorem 5 2. From (5 19) we have 


Cl*l ^ I'c ^ 

m 


(c = 1, ■ • • , m). 


Applying these inequalities to each term in (5.13) and using the identity 


(5.31) 



m 

n 


} 


we obtain (5.20) 

(5 22) and (5.23) follow immediately from (5 13). 
For (5.21) we may write 


(5.32) 


Z)„ > 0, 



A CLASS 01’ statistics 


303 


where 

Dr. = na\Ur.) - (n + iy\Ur.+,) 
Let 


Dyi - ^ / ^n,c^c 


Then we have from (5.13) 


(5.33) 





=(:) (”0 <» - “+(:)■' -«»- <’» - 

(1 < c < m < n)- 

Putting 

..i+r<i^T 

l n J 

where [li] denotes the largest integer < u, we have 

dn.c <0 if c < Co, 

dn.c >0 if c > Co 

Hence, by (5 19), 

dn,c^c ^ — ^CQ^dn.C] (^ "" fj ’ ' ) ^)j 

Co 

and 

-< m 

Dr. > - 2 cd„,o. 

Co c=l 

By (5 33) and (5.31), the latter sum vanishes. This proves (5 32). 

For the stationary case ^ = • = fd-i = 0, (5.24) is a diiect consequence of 

(5.13) and (5.19) The proof of Theorem 5.2 is complete 

6 . The covariance of two f7-statistics. Consider a set of g H-statistics, 

• ■, x„„(,,), (7 = 1, ■ ■, g), 



304 


WASSILY HOEFFDING 


each being a function of the same n independent, identically distributed 
random vectors Xi, ■ ■ , X,,. The function is assumed to be symmetric 
in its 111 ( 7 ) arguments (7 = 1 , • ■ ■ , g)- 
Let 




(7 = 1, ■ 

■•.(/); 

(6.1) 

, ••• , Xmii-)) = (ail, • • • , — 9^''\ 

(7 = 1, • 

■ •, 17); 

(6 2) 

’FI^’(2:i, • • • ,Xc) = ,■■■ ,Xe, X^+i, ■■■ , 

Zm(7)) }, 



(c = 1, m 

( 7 ); 7 = 1, • 

■■ ,g), 

(6.3) 

^cv,.) = , ■ ■ ■, z.)^<<«(Xi , • • •, ZL) 1, 

II 

■■ ,g)‘ 


If, in particular, 7 = 5, we shall write 
(6 4) = = ,Z.)p. 


Let 

U'*’) = iB{- 0‘''')(17”' - e'*’)) 


be the covariance of 17^''’ and 

In a similar way as for the variance, we find, if m ( 7 ) < m {S ), 



The right hand side is easily seen to be symmetric in 7 , 5. 

Tor 7 = 5, (6 5) is the variance of (cf (5.13)). 

We have from (5.23) and (6.5) 

hm na{U^''^) - ‘n^{rd^i\ 
lim f7'”) = m( 7 )in( 6 )ri"''’. 

' 7l-»QO 

Hence, if ^ 0 and fi*' 0, the product moment correlation 17^*') 

between and I/**' tends to the limit 


( 6 . 6 ) 


lim p{X]^'^\ f7‘”) 


j.(7iP 

VrPrF’ 


7. Limit theorems for the case of identically distributed Za’s. We shall now 
study the asymptotic distribution of [/-statistics and certain related functions. 
In this section the vectors Xa will be assumed to be identically distributed. An 
extension to the case of different parent distributions will be given in section 8 . 

Following Cram 6 r [ 2 , p, 83] we shall say that a sequence of d f’s Fi{%), 
Fi{x), ■ ■ ■ converges to a d.f. F{x) if lim F„(a:) = F{x) in every point at which 
the one-dimensional marginal limiting d.f.’s are continuous 



A CLASS OF STATISTICS 


305 


Let us recall (cf. Crain6r [2, p. 312]) that a ^-variate normal distribution is 
called non-singular if the rank r of its covariance matrix is equal to g, and singular 
if r < ^. 

The following lemma will be used in the proofs 

Lemma 7.1. Let Vi, Vi, • ■ be an injinite sequence of random, vectors Vn = 
(Fn'j ■ ■ , awd suppose that the d /. Fn(v) of tends to a df F(v) as 

n —> 00 . Let Vi'*^' = -f where 

(7.1) lim E{di''^]^ = 0, 

n—*ea 

Then the df of Vn = (yu' i ''' > V^f^') tends to F{v) 

This is an immediate consequence of the well-lmown fact that the d.f. of V'n 
tends to F{v) if converges in probability to 0 (cf Cramfo [2, p. 299]), since 
the fulfillment of (7.1) is sufhcient for the latter condition 
Theorem 7.1. Let Xi, • • ■ , Zn be n independent, identically distributed random 
vectors, 

Z« = (Zi«, ••• ,zn, (« = 1. ■■,n). 

Let 

$ (cCl , ■ * ■ I ®m(i'))) ('V “ 1, ’ ) d)l 

be g real-valued functions not involving n, being symmetric in its 711 ( 7 ) (^^) 
vector arguments Xa = (a:«' , • , a:!’’’), (a = 1, • ■ , 771 ( 7 ); T = !> "> 9)- 
Define 

(V.2) = (t = 1, ■ , 9 ). 

where the summation is over all subscripts such that 1 < ai < ■ ■ ■ < amM < ti 
Then, if the expected values 

(7.3) 0^"’ = , • • • , Z„(,))}, (7 = 1 , • , g), 

and 

(7.4) • ,Z„.(,))r, (7 = 1, ,g), 

exist, the joint d.f of 

- 0^'^), ••• , Vn(Cf''' - 0^'’) 


tends, as 77 —> 00 , to the g-variate normal d.f. with zero means and covariance matrix 
(m(7)m(0)ri'^’'^), where fi'*’’*’ is defined by (6.3). The limiting distribution is 
non-singular if the determinant \ fi’’’'*' | is positive. 

Before proving Theorem 7.1, a few words may be said about its meaning and 
its relation to well-known results 

For 0 = 1, Theorem 7 1 states that the distribution of a tl-statistic tends, under 
certain conditions, to the normal form. For m = 1, f/ is the sum of n inde- 



306 


AVASSILY HOEFt’DING 


pendent random variables, and in this case Theorem 7.1 reduces to the Central 
Limit Theorem for such sums For m > 1, 17 is a sum of random variables 
which, m general, are not independent Under certain assumptions about the 
function 'h(a;i, ■ ■ , Xm) the assunptotic normality of U can be inferred from 
the Cential Limit Theorem by well-lcnoivn methods. If, for instance, $ is a 
polynomial (as m the case of the Ic-statistics or the unbiased estimates of mo¬ 
ments), U can be expressed as a polynomial in moments about the origin which 
are sums of independent random variables, and for this case the tendency to 
normality of U can easily be shown (cf. Cram6r [2, p. 365]). 

Theorem 7,1 generalizes these results, stating that in the case of independent 
and identically distributed Xo,’s the existence of E[d^{Xi , ■ ■ , X„.)) is sufficient 
for the asymptotic normality of U. No regularity conditions are imposed on the 
function 1?. This point is important for some applications (cf, section 9). 

Theorem 7.1 and the following theorems of sections 7 and 8 are closely related 
to recent results of von Miscs [18] ivhich rvere published after this paper ivas 
essentially completed. It will be seen below (Theorem 7.4) that the limiting 
distribution of \^[U — 0(F)] is the same as that of Vw[0()S) — e(F)] (cf. (4,5)) 
if the variance of 6 (S) exists. 6 (S) is a differentiable statistical function in the 
sense of von Mises, and by Theorem I of [18], ^/nlOiS) — 0 (P)] is asymptotically 
normal if certain conditions are satisfied. It will be found that in certain cases, 
for instance if the kernel $ of 0 is a polynomial, the conditions of the theorems of 
sections 7 and 8 are someivhat weaker than those of von Mises’ theorem. 
Though von Mises’ paper is concerned with functionals of univariate d.f’s only, 
its results can easily be extended to the multivariate case. 

For the particular case of a discrete population (where F is a step function), 
U and 0 (S) are polynomials in the sample frequencies, and their asymptotic 
distribution may be inferred from the fact that the joint distribution of the fre¬ 
quencies tends to the normal form (cf. also von Mises [18]). 

In Theorem 7.1 the functions $^'’’’(a:i, • ■ , are supposed to be sym¬ 

metric, Smee, as has been seen in section 4, any U-statistic with non-symmetric 
kernel can be written m the form (4 4) with a .symmetric kernel, this restriction 
is not essential and has been made only for the sake of convenience. Moreover, 
m the condition of the existence of .B($“(Xi, • ■ , Xm)), the symmetric kernel 
may be replaced by a non-symmetric one. For, if 4> is non-symmetric, and is 
the symmetric kernel defined by (3.3), F(4>o(Zi, ■ • ■ , X,,,)) is a linear combina¬ 
tion of terms of the form F[§(X„j, • • , 4> (X ^^, • ■ ■ , X^J ), Avhose exist¬ 

ence follows from that of F[‘h^(Zi, ■ • • , X™)} by Schwarz’s inequality. 

If the regular functional e(F) is stationary for F = Fo, that is, if f i(Fo) = 0 

(cf. section 5), the limiting normal distribution ol -\/n(U — 0) is, according to 
Theorem 7,1, singular, that is, its variance is zero. As has been seen in section 
5, <T (V) need not be zero in this case, but may be of some order n~\ 
(c = 2, 3, • • , m), and the distribution of n°'^(U — 6) may tend to a limiting 
form which is not normal. Accordmg to von Mises [18], it is a limiting dis¬ 
tribution of type c, (c = 2, 3, ■ ■ • ). 



A CLASS OF STATISTICS 


307 


According to Theorem 5.2, a{U) exceeds its asymptotic value for any 

finite n Hence, if we apply Theorem 7.1 for approximating the distribution of 
TJ when n is large but finite, we underestimate the variance of U For many 
applications this is undesirable, and for such cases the following theorem, which 
is an immediate consequence of Theorem 7.1, will be more useful 
Theoebm 7 2. Under the conditions of Theorem 7 1, and if 

> 0 , (y = !,■■■ ,ff), 


the joint d.f of 

(C/W _ e^^^)/criU^\ • ■ • , 




tends, asn—* <x>,to the g-vanate normal d f. with zero means and covariance matrix 
where 


tv,« 

p 




( 7 , 6 = 1 , ■■■ ,g) 


PnooF OF Theorem 


7.1. The existence of (7.4) entails that of 


which, by (5.19), (5.20) and (6 6 ), is sufficient for the existence of 
••• of v'(Hn and of fP'*’ < 

Now, consider the g quantities 

jAt) = 2 >ti^>(z„), (7 = 1, •••.?) 

■y/n «=i 

where ^^'^’(a:) is defined by (6.2). • • , 7'"’ are sums of n independent, 

random variables with zero means, whose covariance matrix, by virtue of (6.3), is 

(7.5) {v(7<^',7<«)} = {m(T)w(5)f{"'«}. 

By the Central Limit Theorem for vectors (cf. Cram6r [1, p 112]), the joint d.f. 
of (7*^’, • ■ , 7^°’) tends to the normal jj-vanate d.f. with the same means and 
covariances. 

Theorem 7 1 will be proved by showing that the g random variables 

(7.6) - 0 '"’), (t = 1. • ■ • . ff), 

have the same joint limiting distribution as 7^^\ • • • , 7*°^ 

According to Lemma 7.1 it is sufficient to show that 

( 7 . 7 ) lim - 7^^>)' = 0 , (7 = 1 , ■ • , w). 


For proving (7.7), write 

(7.8) E{Z^^^ - 7'^’)= = E{Z^^^r + - 2F{7'^^7^"'} 



308 


WASSILY HOEFPDING 


By (5.13) we have 

(7 9 ) + 0{n^), 

and from (7.5), 

(7.10) = ?wW’- 
By (7 2) and (6.1) we may write for (7.6) 

z'-^’= (,X)r 

and hence 

= mW) { ? X f: E' 

a-i 

The term 
IS = d"’ if 

(7.11) ai = a or ai ~ a ■ or am( 7 ) = ot 

and 0 otherwise. For a fixed a, the number of sets (cvi, • • , ffmcv)} such that 
1 < ai < ■ • < am(,y) < n and (7.11) is satisfied, is Thus, 

(712) £(Z<«F«| - mM („;,)“ . j) r!” = m'Wrr'. 

On inserting (7,9), (7.10), and (7.12) in (7.8), we sec that (7.7) is true. 

The concluding remark in Theorem 7 1 is a direct consequence of the definition 
of a non-smgular distribution The proof of Theorem 7.1 is complete 
Theorems 7.1 and 7.2 deal with the asymptotic distribution of ■ , U^“\ 

which are unbiased estimates of • , 5^"' The unbiasedness of a statistic 

IS, of course, irrelevant for its asymptotic behavior, and the application of Lemma 
7,1 leads immediately to the following extension of Theorem 7 1 to a larger class 
of statistics 
Theorem 7.3, Lei 

]Ay) 

(7.13) + !^ , (t = 1, • • • > £/). 

V n 

where is defined by (7.2) and is a random variable. If the conditions of 
Theorem 7.1 are satisfied, and lim = 0, (7 = 1, ■ • , g), then the joint 

distribution of 

Vn{U^^^' - e™), • , VniU^”^' - e^"’) 

tends to the normal distribution with zero means and covariance matrix 

{m(7)m(5)d"-®|. 



A CLASS OT' STATISTICS 


309 


This theorem applies, in particular, to the regular functionals 6 (S) of the 
sample di., 


= • t MX.,, ■■,x.j, 

“■ “1=1 “W=l 

in the case that the variance of d(S) exists For we may write 

»-«m-(")[/+ 2 .»(x„.. 


where the sum E* is extended over all m-tuplets (oii, • • • , am) in which at least 
one equality ai = aj(i ^ j) is satisfied. The number of terms in S* is of order 
'nX~^ Hence 


e(S) ~U = -D, 

n 

where the expected value -E[Z)^}, whose existence follows from that of cr^{0(iS)}, 
is bounded for n ^ <xi Thus, if we put = 8^''\S), the conditions 

of Theorem 7.3 are fulfilled. We may summarize this result as follows: 

Theorem 7.4 Let Xi, • , Z„ be a random sample from an r-vanate popula¬ 
tion mth d.f F(x] = F(x^^\ ■ , and lei 


e'y\F) = [ I , x„(„) dF(xi) . (7 = 1 , • , g), 


be g regular functionals of F, where , • , Xm(.y)) is symmelnc in the vectors 

, Xm('y) and does not involve n. If S(x) is the d.f. of the random sample, 
and if the vanancc of 


e'”(s)~±± i: 

n “1=1 “m(l)=l 

exists, the pint d f of 


tends to the g-vanate normal d.f. mth zero means and covariance matrix 


The following theorem is concerned with the asymptotic distribution of a 
function of statistics of the form U or U' 

Theorem 7.5. Lei(U') = be a random vector, where 

IS defined by (7.13), and suppose that the conditions of Theorem 7 3 are satisfied. 
If the function h{y) = h{y^^\ ■ • , does not involve n and is continuous together 
withitssecondorder partial derivatives in some neighbor hood of the point (y) = (6) = 
id^^\ , 0^"'), then the distribution of the random variable -x/nihiU') — h{6)] 

tends to the normal distribution with mean zero and variance 


H Z m(7)m(5) 

5-1 



n 


(7,5) 



310 


AVASSILT HOEFFDING 


Theorem 7.5 follows from Theorem 7.3 in exactly the same way as the theorem 
on the asymptotic distribution of a function of moments follows from the fact 
of their asymptotic normality; cf Cram6r [2, p. 36G] We shall theieforc omit 
the proof of Theorem 7 5. Since any moment whose variance exists has the 
form U' = 6 {S) (cf. section 4 and Theorem 7.4), Theorem 7 5 is a generalization 
of the theorem on a function of momenta. 

8 . Limit theorems for U{Xi, • • • , Z„) when the XaS have different distri¬ 
butions. The l i m it theorems of the preceding section can be extended to the 
case when the Xa’s have different distributions We shall only prove an exten¬ 
sion to this case of Theorem 7.1 (or 7.2), confining ourselves, for the sake of 
simplicity, to the distribution of a smgle Z7-statistic. 

The extension of Theorems 7 3 and 7 5 with (7 = 1 to this case is immediate. 
One has only to replace the reference to Theorem 7.1 by that to the following 
Theorem 8.1, and 0 and by [/} and . 

Theorem 8.1. Let Xi, • • , X^he n independent random vectors of r com¬ 
ponents, Xa having the d J Fa{x) ~ ■ , a:'*'^). Let 4’(.'Ci, • • , Xm) be a 

function symmetric in its m vector arguments xp — {xp\ ■ • ■ , which does not 
involve n, and let 

(8.1) ~ •,o„_iC'i:), (v = 1, • • ,n), 

where is defined by (5.15), and the summation is extended over all subscripts a 
such that 


1 < oil < ai < • • • < Nm-i < n, a, 9^ V, {i = 1, • , m). 

Suppose that there is a number A such that for every n = 1,2, • 

(8 2) I ■■ I ^\xi, ■■■ ,xj dF„,(xi) ■ ■ dF^JxJ < A, 

(1 < ai < a 2 < ■ • ^ a„i < n), 
that 

(8-3) E l#i(,)(X,) I < TO, (r = 1, 2, ■ ■ ,n), 

and 

n _ , f n -s 8/2 

(8.4) lim E I 'L;;„(Ag \/{j: Em^iX,) l = O. 

>'=1 / (;>-l J 

Then, asn-^ <»,the d.f.of {U — E{V])/c{U) tends to the normal d.f.withmean 
0 and variance 1. 

The proof is similar to that of Theorem 7.1. 

Let 


n v=i 



It will be shoivn that 
(a) the d f. of 


A CLASS OS’ STATISTICS 


311 


y_W - E{W] 

<r(F) 

tends to the normal d.f. with mean 0 and variance 1, and that 
(b) the d f of 


^ U-E{U} 

v(17) 


tends to the same limit as the d.f of V. 

Part (a) follows immediately from (8 3) and (8 4) by Liapoimoff’s form of the 
Central Limit Theorem 

According to Lemma 7,1, (b) will be proved when it is shown that 


lim E[V' - F}' = lim 


c{V, W) \ 
a{U)aiW)j 


= 0 


or 

(8.5) 


, v(?7, W) _ , 


Let c be an integer, 1 < c < m, and write 

X = {Xi, • • , Xc), y = (yi, ••• , Vm-c), z = 

F^ix) = Fa,(xi) ■ ■ ■ Po:S^c), Fmiy) = 


F(y){z) = Fy ,{ Zl ) 
Then, by Schwarz’s inequality, 




(Sl I ' ■ ) 3m—o) 
’ F|3m-c(2/m-‘>)i 


f ■■■ j y^Hx, z) dF(a)(x) dP(p){y) dF^y^iz) ■ 

- {/ " ' / fi^wiy) 

• / J z) dF(,a\{x) dF(T)(z)| , 

which, by (8 2), is < A for any set of subscripts. 

By the inequality for moments, 9a^_. .,a„, as defined by (5.14), is also uni¬ 
formly bounded, and applying these inequalities to (5 16), it follows that there 
exists a number B such that 

(8 6) 71. ,7 m-J<-®> (c = 1, ' • • , m), 

for every set of subscripts satisfying the mequalities 

an , Po 9^ Ph , a^ 5^ /3, , 

(i = 1, •• , c,i = 1, • ,m - c). 



312 


WABSILy H0]3P?DING 


Now, we have 

E{W\ = 0 
and 

(8.7) <r\W) 

vr 

or, inserting (8 1) and recalling (5.16), 

(8.8) <r^Tf) = ~ £ E' E' riw«x. , 

n‘ \m — ij K=.i (^,) (^v) 

the two sums S' being over ai < ■ • < a^-i , {ui v), and < • • ■ < Pm-i, 
(/St 9^ v), respectively. By (5.17), the sum of the terms whose subscripts 
V, ai , • • , am-i, |3i , • • , /Sm-i are all different is equal to 

n(n — 1) ■ ■ (n - 2m + 2) ^ _ /n — iVfl. — mV 

(m — 1) !(m — 1)! ^ \m — l/\m — 1/ ‘ 

The number of the remaining terms is of order Since, by (8.6), they are 

uniformly bounded, we have 

(8.9) ' <r\W) = - fi.„ + 0(n-*). 

n 

Similarly, we have from (5.18) 

!j^(XJ) — — fi.n + 0 (n ^), 

and hence 

(8.10) (r(l7) = v(F) + 0(0- 
The covariance of U and W is 


(8.11) a(u, F) = ^ E z'miiAx.n. 

7h 1/=1 






All terms except those in which one of the a’s = v, vanish, and for the re¬ 
maining ones we have, for fixed m , • , «m , 

.®{^l(>')(A^i')'Tm(a,,...,o„)(Xoi , ■ ■ ■ , A^«„)} 

(m — i) ,Vm-l 

where the summation sign refers to the /3’s, and 71 , • ■ • , 7 „_i are the a’s that 
are 9 ^ v. Inserting this in (8.11) and comparing the result with (8.8), we see that 

(8 12) <,([/, W) = V (F). 



A CLASS OF STATISTICS 


313 


From (8 12) and (8.10) we have 

tr(?7, W) ^ 0^) _ naW) 

<T{U)cr{W) a{U) n<r(lF) + 0(1)* 

Comparing condition (8 4) with (8 7), we see that we most have wiW) ^ « 
as n—> 00 . This shows the truth of (8.5). The proof of Theorem 8.1 is complete 
For some purposes the following corollary of Theorem 8.1 will bo useful, where 
the conditions (8.2), (8 3), and (8 4) are replaced by other conditions which are 
more lestrictive, but easier to apply 

Theorem 8 2, Theorem 8 1 holds if the conditions (8.2), (8 3), and (8.4) are 
replaced hy the following. 

There exist two positive numbers C, D such that 

(8.13) / • ■ • / I . ■ • • , I dF„,{xf) . •. dF,Jx,f) < C 


/or a, = 1, 2, • • • , (i = 1, • • • , m), and 
for any subscripts satisfying 

1 < ai < Ola < • ■ • < am-i, 1 < |3i < /3a < • • • < , 1 < r (X, , /3, , 

We have to show that (8.2), (8 3), and (8 4) follow from (8 13) and (8.14). 
(8.13) implies (8.2) by the inequality for moments. By a reasoning analogous 
to that used in the previous proof, applying Holder’s inequality instead of 
Schwarz’s inequality, it follows from (8.13) that 

(8.15) < C'. 

On the other hand, by (8.7), (8.8), and (8.14), 

(8.16) i:^J{^P^w(Z.)l > tiA 

(8.15) and (8.16) are sufficient for the fulfillment of (8.4). 


9. Applications to particular statistics. 

(a) Moments and functions of moments It has been seen in section 4 that the 
/c-statistics and the unbiased estimates of moments are [/-statistics, while the 
sample moments are regular functionals of the sample d f By Theorems 7 1, 
8.1, and 7 4 these statistics are asymptotically normally distributed, and by 
Theorem 7 5 the same is true for a function of moments, if the respective condi¬ 
tions are satisfied These results are not new (cf , for example, Cram6r [2]) 

(b) Mean difference and coefficient of concentration If Fi, • , are n in¬ 

dependent real-valued random variables, Gmi’s mean difference (without repeti¬ 
tion) is defined by 





314 


■WASSIIjY IIOEPI'DING 


If the Fa’s have the same distribution F, the mean of d is 

5 = // 12/1 - 2/21 dF(^Jl) dFiVi), 
and the variance, by (5.13) is 

= ^ J- i) ~ 2) + rs(5)}, 

where 

^ / 1 / 1 2/1 - 2/2 1 dF(2/2)| (IFiyi) ~ s\ 

(9,2) ^(fi) = If (y,- y^fdFiyi) dFiy,) - 5^ = 2a\Y) - 5^ 

The notation fi(5), f2(5) serves to indicate the relation of these functionals of 
F to the functional S{F), 5 is here merely the symbol of the functional, not a par¬ 
ticular value of it In a similar way we shall write 4>(?/i, 7/2 | S) = 12/i - 1 / 2 1 , 
etc. When there is danger of confusing fi(5) with fi(f'), we may write fi(F | d) 
U. S. Nail' fl9] has evaluated (/{d) for several particular distributions. 

By Theorem 71, y^nid - S) 13 asymptotically normal if ^^(8) exists. 

If Fi, ■ , Yn do not assume negative values, the coefficient of concentration 

(cf Gini [8]) is defined by 



where Y = ZYJn (?isa function of two N-statistics, If the F„’s are identi- 
cally distributed, if .®{F^) exists, and if /i = Ji!{F} > 0, then, by Theorem 7,5, 
Vn(G - 8/2n) tends to be normally distributed with mean 0 and variance 

5 1 

fi(M) - -3 fi(/t, (S) + -- J-i(6), 

where 

fi(M) = / 2/“ dF{y) - j\Y), 

Ii(^i, 5) = f f yi I yi — y2 1 dFitji) dF^y^) — y8, 

and fi(5) is given by (9.1). 

(c) Functions of ranks and of the signs of variate differences. Let s(u) be the 
signum function, 

— 1 if w < 0, 
s(w) = 0 if u = 0; 

1 if M > 0, 


(9.3) 



A CLASS OF STATISTICS 


315 


and let 


0 if M < 0, 


(9.4) 

If 


c(w) = J{1 + s(u)} = I if u = 0, 
1 if u > 0. 


Xa = (xi^\ (a = 1, ■ ■ ,n) 

IS a sample of n vectors of r components, we may define the rank Ri'^ of a:!’'' by 


-Ra' = 1 + 2 c(a:i‘' - Tij*’) 

/3=.l 

(9.5) 

= + i 2 s(a:« ’ - 4'^), (f = 1, ■ • ■ ,r) 

I | 9»1 

If the numbers a:i‘\ .Ta*', ■ • , a:» ^ are all different, the smallest of them 
has rank 1, the next smallest rank 2, etc. If some of them are equal, the rank 
as defined by (9.5) is loiown as the mid-rank. 

Any function of the ranks is a function of expressions c(.tL’' — 4'^) 

Conversely, since 

s(;k^‘' - ^^’’) = s(««^ - 

any function of expressions — x^^) or c(a;i^^ — x^^) is a function of the 
ranks. 

Consider a regular functional 0(F) whose kernel $(xi , ■ • • , aim) depends only 
on the signs of the variate differences, 

(9.6) _ fi(a:a’ - (a, /? = 1, ■ • ■ , m, i = 1, , r). 

The corresponding [/-statistic is a function of the ranks of the sample variates. 
The function 'h can take only a finite number of values, Ci, , c,v , say If 

TTi = P|$ = c,l, (z = 1, • • • , A/), we have 

N 

0 = Cl TTi +■••+ Cat 5rj\r, 2 ~ 

t=.X 

TT, is a regular functional whose kernel $,(a:i, • • ■ j Xm) is equal to 1 or 0 accord¬ 
ing to whether $ = c, or + c,. We have 

$ = Ci$l + ■ • • + Civ#J7 . 


In order that 0(F) exist, the c, must be finite, and hence $ is bounded. There¬ 
fore, 'exists, and if'Xi, Xs, ••• are identically distributed, the df of 

\/n(U - 0) tends, by Theorem 7 1, to a normal d.f. which is non-singular if 
fi>0. 

In the following ive shall consider several examples of such functionals. 



316 


■WASaiLY HOEPPDING 


(d) Difference sign correlation. Consider the bivariate sample 

(9.7) ixi^\ .-cf’), (xi^\ a:f). • • ■ , 

To each two members of this sample corresponds a pair of signs of the differ¬ 
ences of the respective variables', 

(9.8) s(a:^^’ - 4'^’), s(a:i*’ - x^), (a 5 ^ /3; a, ^ = 1, ,n), 

(9.8) is a population of n{n — 1 ) pairs of difference signs. Since 

E s(xL‘> - = 0, (i=l,2), 

the covariance t of the difference signs (9.8) is 


(9.9) t = ^ - E s(^«’ - a:r)s(3:a' - 

n{n — 1) afifi 

i will be briefly referred to as the difference sign covariance of the sample (9.7). 
If all x^^*’s and all .r^^’’s are different, we have 

E - 4'^) = n(n - 1), a = 1, 2), 

and then t is the product moment correlation of the difference signs. 

It IS easily seen that i is a linear function of the number of inversions in the 
permutation of the ranks of and x®'. 

The statistic i has been considered by Esscher [0], Lindeberg [15], [10], Kendall 
[12], and others. 

t is a C-statistic, As a function of a random sample from a bivariate popula¬ 
tion, t is an unbiased estimate of the regular functional of degree 2, 

(9.10) ^ ~ f If I ~ ai 2 ”)s(a:® — a;fO dF(xi) dFixt). 

T is the covariance of the signs of differences of the corresponding components 
of Xi = {Xi \ Xi^^) and Xi = (Xl*^ Xf*) m the population of pairs of inde¬ 
pendent vectors Xi, X? ivith identical d.f F{x) = F{x^^\ a;®). If F(x^^\ a;®) 
is continuous, t is the product moment correlation of the difference signs 
Two points (or vectors), (Xi \ a:]’'’) and a:f^) are called concordant or 
discordant according to whether 

- ;rf>) 

is positive or negative. If and are the probabilities that a pair of vectors 
drawn at random from the population is concordant or discordant, respectively, 
we have from (9 10) 

- — ...(c) W) 

T ~ T — w . 

If F(x^^\ a:®') is continuous, we have = 1, and hence 

(911) r = 2x'°’ - 1 = 1 - 2,1®. 



A CLASS OP STATISTICS 


317 


If we put 

f(»“.«®) - ilJ'fr” - 0, *"> - 0) + ?(!<“ - 0,1® + 0) 

+ + 0, rr® - 0) + 7?’(a:® + 0, a:® + 0)}, 

we have 

(9.13) Mx I r) = 1 - 2F(x^^\ ») - 2f(oo, a;®) + 4^(a:®, a;®), 
and we may write 

(914) r = S(4.i(Zilr)}. 

The variance of i is, by (5.13), 

(9.15) a (t) = __ {2fi(T)(«, - 2) f2(T)], 

where 

(916) f,(r) = I r)} - r^ 

(9 17) Ur) = - Zl«)) - T^ 

If F{xP'\ a;'^’) is continuous, wc have Ur) = 1 - r\ and F{x^^\ x'^^) m (9.13) 
may be replaced by F(x^^\ a:®). 

The variance of a linear function of t has been given for the continuous case by 
Lindcberg [15], [16]. 

If and arc independent and have a continuous d f, we find fi(T) = i, 
fsfr) = 1, and hence 

(9.18) ^.(^) ^ 2 ^ 2n + 5) 

9n(ii — 1) 

In this case the distribution of I is independent of the univariate distributions 
of and X'^’. This is, however, no longer true if the independent variables 
are discontinuous Then it appears that /(i) depends on P{X® = X®) 
and P{Xi‘’ = X^'’ = Xa®}, (^ = 1, 2) 

By Theorem 7 1, the d.f. of \U{1 — t) tends to the normal form This result 
has first been obtained for the particulai case that all permutations of the ranks 
of X^'' and X'“’ are equally probable, which coircsponds to the independence 
of the continuous random variables X'*^, X®' (Kendall [12]), In this case t can 
be represented as a sum of independent random variables (cf. Dantzig [5] and 
Feller [7]). In the general case the asymptotic normality of t has been shown 
by Daniels and Kendall [4] and the author [10]. 

The functional r{F) is stationary (and hence the normal limiting distribution 
of V n(t — r) singular) if fi = 0, which, m the case of a continuous F, means that 
the equation $i(X | t) = r or 

(9 19) 4P(X®, X®) = 2F(X®, «) + 2F(oo, X®) - 1 + t 



318 


WASSILY UOEFPWNG 


IS satisfied with probability 1. This is the case if is an increasing function 
of Then i = r = 1 with probability 1, and a^it) = 0 A case where (9.19) 
IS fulfilled and a^it) > 0 is the following: is uniformly distributed in the 

interval (0,1), and 

(9.20) = Z" + I if 0 < Z® < i Z® = Z“> - -i if i < Z'« < 1 

In this case r = 0, fa = 1, ^^(O = 2/n(n — 1). ^ 

(e) Rank correlation and grade correlation If in the sample {(x[,^', .Tn')), 
(a = 1, ■ ■ , n), all a;L^’’s and all are different, the rank correlation co¬ 
efficient, which we denote by k', is given by 


k' = 


12 


1.3 - ‘ 


E [R'-a^ - 


n + 1 




( 2 ) 




Inserting (9.5) we have 


/o' = 


1.3 _ 


E E E sC'Ca' - 


( 2 ) 


.1 ^=1 Y -1 



or 

(9. 21) 


k' = 


{n — 2)/c + 3/ 
n + 1 


whore t is the difference sign covariance (9.9), and 


" ~ n{n - l)(n - 2) ^ ^ ^ ^ 

the summation being over all different subscripts a, 0,y. 

k is a /7-statistic, and as a function of a random sample from a population with 
d.f. F, k is an unbiased estimate of the regular functional of degree 3, 


K = S J ■ ■■ I 5 ( 0 :® — a:®)s(.T® — aif') dF{xi) dF{xi) dF{xi) 

(9.22) 

= 3 11 (2F®(a:®) - l){2R®(a;®) - 1) dF{x), 

where R™(x®) = Z(j:®, oo), F®(a:®) = Z(oo,x®). 

If F IS continuous, we have 

I F®(2/) dF®(2/) = l\ du = I, 

/ dZ'”(w) = j\u- if du = ^, { 1 = 1, 2), 


and m this case k is the coefficient of correlation between the random variables 
17® = R®(Z®), 17® = Z®(Z''’). 



A CLASS OP STATISTICS 


319 


t/^’' hjis been termed the grade of the continuous variable X^'\ and in the general 
case may be called the grade of X^'^ (of., for instance, G. U. Yule and 

M. G. Kendall [22, p. 150]). In general, k is 12 times the covariance of the 
grades 

Prom (9.21) we have for the expected value of k', 

!-^TT— • 

In the continuous case the rank correlation coefficient ¥ is an estimate of the 
grade correlation k, which is biased for finite n but unbiased m the luniL. 

The kernel 3s(a;i^’ — of k is not symmetric Denoting by 

$(ti, T 2 , Ts I k) the symmetric kernel of k, we have 

(9.23) ''I>(a:i, 3 : 2 , Xi | «) = k £ s(a;a^ - 3 :;^^’)s(.'i:i^’ - 

ay&t 

Foi computing k and the constants an alternative expression for k and $ is 
sometimes more convenient. From three two-dimensional vectors a-’i, Xi , xz 
we can form thiee pairs (ti , 3 : 2 ), {xi , xz), and {xi , Xz). The number of con- 
coidant pairs among fhera can be 3, 2 , 1 , or 0 If 7 is the probability that among 
the three pairs formed from three random elements of the population at least 2 
are concordant, we have, if the d f. F is continuous, 

(9.24) K = 2t — 1. 

This is analogous to the expression (9.11) for r. 

The truth of (9 24) can be seen as follows: From the definition of 7 we have 

7 = J?{$(x-i, Xi , Xz]-/)], 

whore <l>( 3 :i, Xz , a' 31 7) is = 1 if at least two of the three expressions 

(9.25) (T^’ - .Tr)(3:"> - ), (a < /3; a, /3 = 1, 2, 3) 

are positive, and equal to zero, if no more than one of them is positive Since, 
by the continuity of F, we may neglect the case of (9 25) being zero, .we may 
write 

thCri , Xz , Xz I 7 ) = Ci2,12C23,23C31,31 + Cl2,12C2J,23C31.13 + Cl2,12C23,32C31,31 + Cl2,2lC23,23C31,31 , 
where 

Ccfl.y.l = o[{x^a^ - 

and c(m) is defined by (9.4). 

'I>( 3 ;i, Xz , Xz 17 ) is symmetric in a:i, 3 : 2 , 2 : 3 . 

The identity 

( 9 . 26 ) #(.t;i , 3:2, 3,3 U) = 24 >( 3 :i , 0:2, 3:3 1 7) - 1 



320 


WASSILY HOKFFDING 


can bs shown to hold either by algebraical calculation using (9 4) or by direct 
computation of each side for the different positions of the tlnee points iCi, Xi, . 1 : 3 , 
From (9.26) it appears that in the continuous case the symmetric kernel 
^(xi, Xi 1 X 31 k) can a.ssume only two values, —1 and + 1 . 

The variance of /c is, according to (5.13), 

' ® -1)(. ^ {K" 2 0 +f-w} ’ 

where 

fi(K) = U)) - 

fj(K) = E{^l{Xi , X 21 k)) ~ <t, 

u^) = E[^\X^,X,,X3\K)] ~ 

$i(a;i 1 k) = E{‘iixi , X 2 , X 31 k)) , 

* 2 (.'ri ,X 2 \k) = E[^{Xt , 312 , Z 3 1 If)). 

We find for the continuous case 

U>c) = !-«', 

(9.27) Hxi\>c) = [I - 2Fixi^\ «)][! - 2F(cc, .i:f>)] - 2F{xi^\ = 0 ) 

- 2 F( 00 , >) + 4/ ~, y®) 

+ ifFiy^»,xf)dFii^ ~), 

42 ( 2 : 1 , 2 : 2 1 k) = 1 + 2 F{x?\ xf’) + 2 F(a: 2 “, 2c(a:P^ - x[^^)Fix?'\ «=) 

— 2 c{xi^ — X 2 ’)F{x 2 ^\ co) — 20 ( 3 : 2 “ — a;i“)F(co, 3 : 1 ®) 

- 2c( 3:(“ -3:fy(«>,a:f'). 


If are continuous and independent, we obtain /c = 0, fi = i, fa = -h, 

fa = 1, and hence 


(9.28) 




- 3 

n(n — l)(n — 2) ‘ 


In the discontinuous case of independence the distribution of Jc, as that of I, 
depends on the distiibutions of X® and X'“, and o-“(/c) can again be expressed 
intermsof P{Xl“ = X^“} andP{Xi‘’ = Xf = X,'*’), (i = 1, 2). 

The variance of the rank correlation coefficient fc' is, by (9 21), 

- 2)V'(fc) + 6(n - 2Mt, k) + 9<r^(i) 

^ ’ {n + 1)2 


( 9 . 29 ) 



A CLASS 01’ STATISTICS 


321 


For (T(t, k) we have, according to (6.5), 

0 

~ n{n — 1) ~ 3)Ci(t, k) + fsCr, k)}, 

where 

fiCr, k) = E{^i{Xi I r)$i(Xi I k)} — TK, 

^2(1, k) = E{^{Xl , X2 I t)$ 2 (Xi , Xa I k)) — TK, 

In the case of independence we see from (9.13) and (9.27) that 

1 0 = U) = [1 - 2F(a:", oo)][i - 2F(oo, a:®)], 
and we obtain 


(9.30) 

(9.31) 


ri('r, k) = fi()() = {-1(7) = 
^2(7, k ) = f, 


(r{t, Ic) 


2(w + 2) 

3n(n — 1) ‘ 


On inserting (9.28), (9.31) and (9 18) in (9.29), we find 



in accordance with the result obtained for this case by Student and published 
by K. Pearson [20]. 

According to Theorem 7.1, -y/nik — k) tends to be normally distributed with 
mean 0 and variance 9fi(«). The same is true for the distribution of the rank 
correlation coefficient, k', as follows from Theorem 7.3 in conjunction with 
(9 21). For the special case of independence the asymptotic normality of k' 
has been proved by Hotelling and Pabst [11]. 

From Theorem 7 3 it also follows that the joint distribution of y/nlt — t) 
and ^/nik — k) (or y/nik' — «)) tends to the normal form with the variances 
45'i(r) and 9fi(K) and the covariance 6fi(K, r). In the case of independence we 
see from (9.30) that the correlation /t, k) between t and k tends to 1, and we have 
the asymptotic functional relation 3< = 2k. This result has been conjectured by 
Kendall and others [14], and proved by Daniels [3]. In general, however, p(t, fc) 
does not approach unity. Thus, if X'*' is uniformly distributed in (0,1), and 

X® = ^ - X® if 0 < X® < i, 

X<'’ = J + X® if i < X'^^ < 

(9 32) X'"’ = X® -i if ^ < X^"' < I, 

X® = t - X® if f < X® < 1, 

we have t = « = 0, ^(t") = 0, f2(7-) = 1, fi(K) = t^, fi(K, r) = 0, and hence 
pit, k) 0. 



322 


WASSILY HOEFFDING 


(f) Non-parameiric tests of i?idepandence Suppose that the random variables 
Z‘"’ have a continuous joint d.f F{x^"\ and we want to test the 
hypothesis Ho that Z^^' and Z^ ^ are independent, that is, that 

The distribution of any statistic involving only the ranks of the variables 
does not depend on the d.f. of the population when Ho is true. For this reason 
several rank order statistics, among them the difference sign correlation t and 
the rank correlation /c', have been suggested for testing independence 
From the preceding results we can obtain the asymptotic power functions of 
the tests of independence based on t and k'. If Ho is true, we have E{t\ = t = 0, 
and the critical region of size « of the t-test may be defined by | f | > Cn, where 
c„ is the smallest number satisfying the inequality 

(9.33) P{\t\> c„\Ho] < t. 

By Theorem 7.2 and (9.18) we may write c„ = 2X„/3v'ji, 'where X„ tends to a 
positive constant X depending on e. 

Since cf{t) = 0(n“‘), the power function 

P„(H) = P{ 1«I > 2X„/3Vn 1 H} 

tends to one as n —^ oo for any alternative hypothesis H with t(F) ^ 0. If, 
however, t = 0, we have lira P„{H) <1. If t = 0 and fi(r) < i, we have even 
lim P„(H) < i, and with respect to these alternatives the test is biased m the 
limit. Thus, in the case of the distribution (9.20) we have even P„(7I) 0. 

In this case there is a functional relationship between the variables, and the 
distribution must be considered as considerably different from the case of in¬ 
dependence. 

For the rank correlation test we have a similar result. If Cn is the_smallest 
number satisfying P{ | /c' | > Cn \ Ho} < «, 'wc have c(i = \n/^/n, where 
lim \'n - X, and the test is biased in the limit if k = 0 and fi(K) < -s-. This is ful¬ 
filled in the case of the distribution (9.32), where ^i{k) = -ft- 

The question arises whether there exist non-parametric tests of independence 
which are unbiased or unbiased in the limit. This point will be discussed in a 
separate paper on tests of independence 

(g) Mann’s test against trend Let Yi, , F„ be n independent real-valued 

random variables, 7^ having the continuous df. P«(j/), (a = 1 , , n). 

The hypothesis of randomness, 

Pi : F,{y) = • ■ = FM 

is to be tested against the alternative hypothesis of a “downward trend,” 


Ho : Pi(i/) < Fo{y) < ■ < P„(y). 

H. B. Mann [17] has suggested a test of Ht against Ho based on the number T 
of inequalities Ya < Y^, where a < j3 'We may write 


2T - 


n{n — 1) 


E s( 73 - 7„) = E s(a - /3)s(7„ - Yf). 

a</3 a<p 


2 



A CLASS OF STATISTICS 


323 


The Z7-statistic 


t = {^T/nin - 1)} - 1 

is the same as (9.9) for the special case when one component is not a random 
variable 
Let 


Ta» = s{a ~ fi) J J s(yi - ys) dF^(yi) dF^iy^) 

= s(« - d){2 j F,{y) dF„ly) - l|. 

We have = 0 if Hi is true and Tap < 0 if J ?2 is true. 

Since 


EH] = T„ = E 

n{n — 1) „<|S 

it follows that E{t] =0 under Hi and < 0 under Hs. 

Mann’s lest against trend has the power function Pn{H) = P[t < On | i?), 
where an is the largest number satisfying P{t < an\ Hi] < e. 

Since On —>■ 0 and, by (5.18), <r^(<) = 0(n~^), it follows from Tchebycheff’s 
inequality that the test is consistent (that is, PniHi) —>• 1) and hence unbiased 
in the limit. This has been shown by Mann who also gave sufficient conditions 
under which the test is unbiased for finite n. 

By Theorems 8 1 and 8 2 the distribution of {t — T„)/v(i) is asymptotically 
normal if certain conditions are satisfied Since (8.2), (8.3) and (8.13) are ful¬ 
filled, either of the conditions (8.4) and (8.14) is sufficient 

(h) The coefficient of partial difference sign correlation. Consider a three- 
variate sample xi, ■ ■ ■ , Xn ; x'^\ (a = 1, • ■ • , n). In a sim¬ 

ilar way as in section 9d we may form the set of the n{n — 1) triplets of differ¬ 
ence signs, 

(9.34) s(a:L^’ - s(xi^^ — s(xi^^ — 

{a 9^ Pi a, P = 1 , ■ ■ ,n). 

We shall assume that all a;^^’'s, a;®’’s, and are different. Then the triplets 
(9.34) contain only two different numbers, +1 and —1. Hence the regression 
functions of the three-variate population (9 34) are linear. 

If ha, tia, and ha are the difference sign correlations of {s(a:L^* — 
s(xi^^ - Xfi^^)], {s(xi^^ - a:™), s(xi^^ - Xfi^^)} and {s(a;L“^ - a:®), s(a;L®’ - 
respectively, we have for the coefficient ha a of partial correlation between 
sGto ' — a;,s^’) and s(a:a ’ — x^^^) with respect to s(xi^^ — x^^), 

ha — ha ha 
V^(l ^3)(f “ ^as) 


(9.35) 



324 


AYASSIIjT hoefpding 


This measure of partial correlation has been suggested by ICeirdaU [13] who 
gave an alternative definition of ( 13.1 ■ 

If we have two independent three-dimensional random vectors 
Zi*= (Zi", X?, Zl“) and Za = (Zr\ Xf\ ZT) with the same continuous 
di, .r®, the distribution of the difference signs s(Z^i'’ - Za"'), 

(f = 1, 2, 3), has again linear regression functions, and wo may define the 
partial difference sign correlation 


Tia — Ti3 T 33 

" V(1 - r?,)(] - rls) ’ 

where t.j is the difference sign correlation of Z^'^, X’'’\ 

If fi 2 3 is a function of a random sample, and if tu 9 ^ 1, tm ^ 1, the d.f. of 
Vnitn.i - Tm a) tends, by Theorem 7.5, so the normal d.f. with mean zero and 
variance 


= (1 - rL)(l - r^a) ''' ^"(1 

(ti 3 - TiaTaa)^ , . 

' flU23f — * 


T23 112 ri3 


113 ~ 112128 


, 2 fl(n2, II 3 ) “2 2 

1 — iia 1 — 123 


,2 fl(ll2 , 123) 


where 


. „ (l23 ~ 112ll3)(ll3 — 112123 ) . V 

+ ^ (1 - 1?3)(1 - 1 ^ 3 ) 


f(ie) {$?(Z 11 „)} - 1^1, 
fl(lw . vO = XI$i(X I T,y)^l(Z I 1„;,)} TpA ) 


and, for instance (cf (9.13)), 


4’i(a; 1 T 12 ) = 1 - 2^'(a;''’, CO, 00 ) — 2F(oo,a;®, co) -f- (a:'*’, ir®, w). 


If 113 = 12 a = 0, we have 

112.3 = 4 ^ 1 ( 112 ), 

and Vn[ti 2 s — 112 a) has the same limitmg distribution as Vnitn — H 2 )• This is 
in particular the case when Z^^’, Z®, Z™ are independent 


REFERENCES 

[1] H, Cbam£b, Random Variables and ProbabiUly Distnbulions, Canibndge Tracts in 

Math , Cambridge, 1937. 

[2] H, CEAMfiB, Mathemalical Methods of Slatislics, Princeton University Press, 1946 

[3] H, E. Daniels, “The relation between measuies of con elation in the universe of sam¬ 

ple permutations," Biometnha, Vol 33 (1944), pp 129-136 

[4] H. E Daniels and M. G, Kendall, “The significance of rank correlations where 

parental correlation exists,” Bwmelrika, Vol 34 (1947), pp 197-208, 

[5] G B. Dantzio, "On a class of distributions that approach the normal distribution 

function,” Alnnals 0 /Mart Stef., Vol. 10 (1939) pp 247-253 



A CLASS OF STATISTICS 


325 


[6] FrEssGHEE, ‘‘On a method of detciminmg correlation from the ranks of the vaiiates,” 

Skandimmh AUuai lids, Vol 7 (1924), pp 201-219 

[7] W. Felleu, “The fundamental limit theoiems in probability/’ Am Math Soc Bull, 

Vol, 51 (1945). pp 800-832, 

[8] C Gini. “Sulla misura della concentrazionc e della variability, dei caratteii,” Alii del 

R kiiiuio Venelo di S.L.A,, Vol 73 (1913-14), Part 2 

[9] P R PIauios, “The theory of unbiased estimation,” innais 0 /MflJ/i ^Stei.Vol 17 

(1946), pp 34-43, 

[10] W I'IOppding, “On the diatiibution of the rank correlation eoefficientTj when the 

variates are not independent,” Biometriha, Vol 34 (1947), pp, 183-196 

[11] H Hotelling AND M,P Pabst, “Rank correlation and tests of significance involving 

no assumptions of normality,” Annals of Malh Slat., Vol 7 (1936), pp 29-43, 

[12] M, G, Kendall, “A new measme of lank correlation,” Brnnieinte, Vol 30 (1938), 

pp, 81-93 

[13] M G Kendall, “Partial lank correlation,” Rwinelrik, Vol 32 (1942), pp, 277-283 

[14] M, G Kendall, S F H Kendall, and B Babington Smith, “The distribution of 

Spearman’s coefficient of rank con elation in a universe in which all rankings 
occur an equal number of times,” Biomelnk, Vol 30 (1939), pp 251-273 

[15] J W, Lindebebg, “tlber die Korrelation,” VI Sland Matemalikerkongres i K^ben- 

ham, 1925, pp 437-446 

[16] J W Lindbberq, “Some lemarks on themean erroi of the percentage of correlation,” 

Nordic Siaiistical Journal, Vol. 1 (1929), pp 137-141 

[17] H. B Mann, “Nonparametnc tests against trend,” Econometnca, Vol 13 (1945), 

pp 245-259, 

[18] R V Mises, “On the asymptotic distribution of differentiable statistical functions,” 

Annals of Malh. Slat., Nod 18 (1947), pp 309-348 

[19] U, S Nair, “The standaid error of Gim’s mean difference,” Biomeirite, Vol 28 (1936), 

428-436 

[20] K, Pearson, “On further methods of determining correlation,” Drapeis’ Company 

Research Memoirs, Biometiic Senes, IV, London, 1907, 

[21] V, VoLTEREA, Theory of Funchomls, Blackie, (authorized translation by Miss M 

Long), London and Glasgow, 1931 

[22] G U Yule and M G, Kendall, An Iniwduction to the Theory of Statislics, Griffin, 

11th Edition, London, 1937 



OPTIMUM CHARACTER OF THE SEQUENTIAL PROBABILITY RATIO 

TEST 

A. Wald and J. Wolfowitz 
Columbia University 

1 . Summary. Let So be any sequential probability ratio test for deciding 
between two simple alternatives Hg and Ih, and Si another test for the same 
purpose. We define (t, j = 0, 1): 

ct^{S,) = probability, under S,, of rejecting when it is true; 

El (n) = expected number of observations to reach a decision under test S, 
when the hypothesis Hi is true. (It is assumed that B\ (n) exists.) 

In this paper it is proved that, if 

a.(Si) < q:,(So) (i = 0,1), 

it follows that 

El (n) < El (n) (i = 0, 1), 

This means that of all tests with the same power the sequential probability ratio 
test requires on the average fewest observations. This result had been con¬ 
jectured earlier ([1], [2]) 

2. Introduction. Let p,(a:), i = 0,1, denote two different probability density 
functions or (discrete) probability functions. (Throughout this paper the index 
i will always take the values 0,1). Let Z be a chance variable whose distribu¬ 
tion can only be either pq{x) or piix), but is otherwise unknown. It is required 
to decide between the hypotheses Hg, Hi , where H, states that pi{x) is the dis¬ 
tribution of Z, on the basis of n independent observations xi, Xn on X, 
where ri is a chance variable defined (finite) on almost every infinite sequence 

U = Xi,Xi, ••• 

le., n is finite with probability one according to both po{x) and pi(x). The 
definition of n(u) together with the rule for deciding on Hg or Hi constitute a 
sequential test 

A sequential probability ratio test is defined with the aid of two positive 
numbers, A* > 1, B* < 1, as follows: Write for brevity 

i 

= n P.(a:*)- 

i.l 

Then n = j if 


> A* or < B* 

Po, 


326 



SEQUENTIAL PROBABILITY RATIO TEST 


327 


and 


If 


if 


B* <'^ < A*, 

Pm . 


^ <J- 


— > the hypothesis Hi is accepted, 

Pdn 


— - < B* the hypothesis Hq is accepted 

POn 

In this paper we limit consideration to sequential tests for which E,in) exists, 
where Ei{n) is the expected value of n when H^ is true (i e., when is the dis¬ 
tribution of X). It has been proved in [3] that all sequential probability ratio 
tests belong to this class. The purpose of the paper is to prove the result stated 
m the first section. Throughout the proof we shall find it convenient to 
assume that there is an a priori probability that Hi is true ( 6^0 + ?!= 1 , we 
shall write g = (go , gi)). We are aware of the fact that many statisticians 
believe that in most problems of practical importance either no a prion pro¬ 
bability distribution exists, or that even where it exists the statistical decision 
must be made in ignorance of it; in fact we share this view. Our introduction 
of the a prion probability distribution is a purely technical device for achieving 
the proof which has no bearing on statistical methodology, and the reader will 
verify that this is so. We shall always assume below that go 0, 1 
Let Wo j Wi, c be given positive numbers. We define 

R = go(Woao + cEoin)) gi(WiQ;i + cEi{n)), 

and call R the average risk associated with a test S and a given g (obviously R 
is a function of both). We shall say that Hi is accepted when the decision is 
made that p^ix) is the distribution of X. We shall say that Hq is rejected when 
Hi is accepted, and vice versa. The reader may find it helpful to regard W^ 
as a weight which measures the loss caused by rejecting H{ when it is true, c as 
the cost of a single observation, and R as the average loss associated with a given 
g and a test *5. For mathematical purposes these are simply quantities which 
we manipulate in the coprse of the proof. 


3 . Role of the probability ratio. Let g,W = (Wo, Wi), and c be fixed. Let 
be a given sequential test, with R{S) the associated risk and n(w, S) the as¬ 
sociated “sample size” function. Let \(/(xi , • ■ ■ , a;„) be the “decision” function; 
this is a function which takes only the values 0 and 1, and such that, 
when a:i, ■ ■ ■ , a:„ is the sample point, the hypothesis with index 4 ^{xi , • • , a:„) is 
rejected. Define the following decision function (p{xi , • , Xn): ip = 0 when 

‘ X = 

WogoPoB 



328 


A. WALD AND J. AVOLFOWITZ 


is greater than 1, and ip — \ when X < 1 When \ = I, <p may be 0 
or 1 at pleasure 

It must be remembered that an actual decision function is a single-valued func¬ 
tion of (o-i, • ■ • , Xn). We note, however, that 

a) the relevant properties of a test are not affected by changing the test on a 
set T of points u whose probability is zero according to both Ho and Hi, i.e., 
changing the definition on T of n and/or of the decision function, leaves ao, 
ai, Eo{n) and Ei{n) unaltered In particular, the average lisk R remains un¬ 
changed. 

b) the set of points for ivhich pon = pin = 0 and X is indeterminate, has prob¬ 
ability zero according to both Ho and Hi. 

In view of the above we decide arbitrarily, in all sequential tests which we 
shall henceforth consider, to define n = j, and = 0, whenever poj = pi, = 0, 
and n ^ I, ■ ■ , {j - 1). By this arbitrary action will not be changed. 

Let now 

r ^P»n 

A'ln ~ i , 

ffoPon + OlPln 

Ln = cn + min (Lo„ , Lm). 

We have 

Avhere the operator E denotes the expected value with respect to the joint dis¬ 
tribution of Hi and (a:i, ■ • • , a:„), i.e , E is the operator goEo + giEi. If now 
the event 5^ <p and X 1) has positive probability according to either 

Ho or Hi , we would have, for n = n{o), S), 

ELipn EL^n 

Hence, if the decision function connected with the test S were replaced by the 
decision function ip, R rvould be decreased Since our object throughout this 
proof will be to make R as small as possible, we shall confine ourselves henceforth, 
except Avhen the contrary is explicitly stated, to tests for Avhich ip is the decision 
function. This Avill be assumed even if not explicitly stated 

The function p has not yet been uniquely defined when X = 1 A definition 
convenient for later purposes will be given in the next section. R is the same 
for all definitions. 

We thus have that ip is a function only of X, or, what comes to the same thing 
when T'7 is fixed, of = — Define 




^7 


i = 1,2, 



SEQUENTIAL PROBABILITY RATIO TEST 


329 


We shall now prove 

Lemma 1. Let g, W, and c be fixed. There exists a sequential test for ivhich 
the average risk ts a minimum Its sample size Junction n{w, S*) can be defined 
by means of a properly chosen subset K of the non-negahve half-line as follows. 
For any co consider the associated sequence 


ri,ri, ■■■ 

and let j be the smallest integer for lohich r^ e K Then n = j. The function n 
may he undefined on a set of points co whose probability according to Ho and Hi is zero 
Let a = (ai , ■ , Ud) be any point in some finite d-dimensional Euclidean 

space, provided only that pod(a) and pu(fl) are not both zero. Let b = and 

let lia) = cd + min {Loi , Lu) Let D be any sequential test whatever for 
which ?i(co, D) > d for any co whose first d coordinates are the same as those of 
a, and for which Bin | a, D) < where E(n | a, D) is the conditional expected 
value of n according to the test D under the condition that the first d coordinates 
of CO are the same as those of a. For brevity let G represent the set of points co 
which fulfill this last condition, i.e , that the first d coordinates of co are the same 
as those of a. Finally, let E{L„ j a, D) be the conditional expected value of 
Ln according to D under the condition that to is in the set G We know that 
min(Lod, Lu) depends only on r^Ca) = h 
Write 

v(a) = sup [f(a) — EiLn \ a, D)]. 

D 

Let flo = (ooi, ■ • • , ciok) be any point such that 

pu{a) ^ puiao) 

Pod{a) pokiao) ‘ 

Let Do be any sequential test whatever for which n(co. Do) > k for any co whose 
first k coordinates, are the same as those of oo, and for which E'(n | ao, Do) < » 
Let 

v{ao) = sup [i(flo) — E{Ln\ (k ,Da)]. 

Do 

We shall prove that v(a) = v(ao) Thus we shall be justified in writing 

yib) = v{a) = v(ao) 

Suppose, therefore that v{a) > i'(fflo)- Fet Di be a test of the type D such 
that 

z(.)A) 

We now partially define another sequential test Dio of the type Do as follows: Let 


d — di f * ' i ) vi)' ‘ 1 ) 



330 


A. WALD AND J. AVOLFOWITZ 


be any sequence such that nia, -Di) ^ d-\- t. Then for the sequence 

Uq — Uoi j * ‘ > (IfiL j Vl j ‘ ‘ ) Vt ) 

let n{ao, Dm) = k t. The decision function associated with will be 
partially defined as folloivs: 

hiflo) = via). 

(The reader will observe that it naay happen that i/'oCdo) ^ vida))- Since I'dia) = 
ruiao) it folloAvs that 

Ha) - EiL, 1 a, Di) = Z(ao) - EiL, \ a,, Dm) > > vioo), 

in violation of the definition of i/(«o). A similar contradiction is obtained if 
via) < i'(ao) Hence via) = r(ao) as ivas stated above. 

We define K to consist of all numbers b Avhich are such that there exist points 
a with ?q(o) = b, and for which 7(6) < 0, We shall now prove that the test S* 
defined m the statement of the lemma is such that f?(S*) is a minimum. Recall 
that the aveiage lisk is the expected value of i„ Let S be any other test. 
Let a* = (fli , • , ttdO be any .sequence such that either n(o*, 5*) = d*, or 

n(a*, S) ~ d*, but nia*, S’'’) 7 ^ nCa*, E). We exclude the trivial case that the 
probability of the occurrence of such a sequence, under both f/o and E\, is zero. 
Let uHa*) = b*. The sequence a* may be one of three types: 

1) 7 ( 6 *) < 0. Hence h* eK, n(a*, S) > d*. It is more advantageous, from 
the point of view of diminishing the average risk, to terminate the sequential 
process at once, since S(I/„ j a*, iS) > lia*). 

2) 7(b*) = 0. Hence h* eK, n(a*, S) > d*. If lia*) — H(L„ j a*, S) = 0, 
i.e., the supremum is actually attained by S, then, as far as the average risk is 
concerned, it makes no difference whether the sequential process is terminated 
with a* or continued according to S. If, hoivever, Z(a*) — i?(Ln | a*, S) < 0, 
it IS clearly disadvantageous to proceed according to S. It is impossible that 
lia*) — EiLn 1 a*, S) > 0, since 7(b*) = 0. 

3) yib*) > 0. Hence h* 4 K, nia*, S) = d*. Clearly it is more advantageous 
from the point of vieiv of diminishing the average risk not to terminate the 
sequential process, but to continue with at least one more observation. After 
one more observation Ave are either in case 1 or 2, where it is advantageous to 
terminate the sequential process, or again in case 3, where it is advantageous to 
take yet another observation. 

We conclude that RiS*) is a minimum, as was to be proved 

4. A fundamental lemma. Consider the complement of K with respect to 
the non-negative half-line, and from it delete all points b' for which there exists 
no point a in some d-dimensional Euclidean space such that Vdia) = h’. The 
point 1 is never to be considered as of the type of b', i.e., 1 is never to be deleted 
Designate the resulting set by K. 



SEQUENTIAL PROBABILITY RATIO TEST 


331 


Our proof of the theorem to which this paper is devoted hinges on the follow¬ 
ing lemma: 

Lemma 2. Let W, g, c he fixed, and K he as defined above. There exist two posi- 

■yy 

tive numbers A and B, with B < < A, such that 

Wigi 

a) if h e K, then either b > A or h < B 

b) ijh eK,B <h < ^1. 

Two remarks may be made before proceeding with the proof: 

1) We may now complete the definition of tp for tests of the type of S*. The 

Wo9o 

Wigf 


reader will recall that ^ was not uniquely defined when \ = ],i.e.,whenr,i = 
Lemma 2 shows that it is necessary to define ip(X) only when X = e K and 

ITigi 


X is therefore either A or B. We will define^j 


Wi£ 


)£/o\ n 1 r 

— as 0 or 1 , according as = 5 ^-^ 

I <71/ Wigi 

IS A or B, and A 9 ^ B. This is simply a convenient definition which will give 

uniqueness. When A = B = « K, the situation is completely trivial, and 

IT igi 

we may take (p = Q arbitrarily 

2) If 1 e K the above lemma shows that the average risk is minimized (for 
fixed W, g, c, of course) by taking no observations at all. We have p = 0 or 1 
according as 1 > A or 1 < 5 

Proof op the lemma : Let h > be a point in K. We will prove that any 

KKi^l 

point h' such that < h' < h, and such that there exists a point a 'm some 

Wigi 

d'-dimensional Euclidean space for which rdfa') = h', is also in if In a similar 

way it can be shown that, if ho < is any point in K, any point ho such that 

Wigi 

ho < ho < , and such that there exists a point a'o in some (^''-dimensional 

Wigi 

Euclidean space for which Td"{ao) = h'o, is also in K This will prove the 
lemma 

Let therefore h and h' be as above. Let <5* be the sequential test based on 
K, with the decision function cp. Let a be a point m d-space such that rd{a) = h. 
Since h € K we have yOi) > 0 

We now wish to define partially another sequential test S, with a decision 
function which may be different from p, as follows: Let a' be defined as above. 
Write 

a — (ui, *' * , af) 

a-' = (ffll, ■ • • , 

Let 

Cb “ fll ( * * * ) j ) ' " ' ] Vb 



332 


A. WALD AND 3 . -WOLFOWITZ 


be any sequence such that n(«, S*) = d + 1. Then foi' the sequence 
(f = a[ , ,ad',yi,-- , ijt 

let n{d', S) = d' + I The decision function ^ associated with B will be partially 
defined as follows: 

= v)(d). 

Clearly 

(4,1) E^(n I a, S*) ~ d = B^n \ n', B) ~ d' (i = 0, 1) 

and 

(4 2) E.(<p I a, 6’V = E,(^ | a', S) (t = 0, 1). 

Furthermoie, wo have 
1(a) - E(L„ 1 a, S*) 


(4 3) = {TFo + cd ~ cEoin | a, S*) - TFo[l - F1 o(p | a, 3V1 } 


+ M ~ cEi(n I fl, S*) - Ei(<p I a, S*)). 

Since 7 (h) > 0, and since 

(4.4) cd - cEM I a, B*) - W^Eii^ | a, S*) < 0, 

we must have 

(4 5) W, + cd- cEoin \ a, S*) - W,[l - E,{p | a, ,S*)] > 0. 

From hi < hit follows that 


nnx !7o ^ ffo , gih' , gih 

do + !7i ft do -f di h j/o + di h do + di 

Relations (4.1), (4 2 ), (4 4), (4.5) and (4.6) imply that the value of the right hand 
member of (4.3) is increased by replacing (p, h, a, S* and d by h', a', B, and 
d', respectively This proves our lemma. 

If there are values which r, cannot assume the pair B, A might not be unique 
For convenience we shall define a 1 find B uniquely in the manner described below. 
We will always adhere to this definition thereafter 
We shall first define 7 (h) for all positive h in a manner consistent with the 
previous definition, which defined y{h) only for those values of h which could be 
assumed by Let h be any positive number and D(h) be any sequential test 
with the following properties 


(4.7) 


there exrsts a set Q(h) of positive numbers such that n = j 
if and only if the j-th member of the sequence 

hri, hrt ,hri, • • 



SEQUENTIAL PROBABILITY RATIO TEST 


333 


is the first element of the sequence to be in QQi) 

(4 8 ) E,{n 1 D{h)) <00 (f = 0, 1). 

We define, for h > , 

Wigi 


(4 9) y(h I Dih)) = - {FoSo(^ I Dih)) - cE,{n \ D (Ji))] 

?o + giii 

+ ! Dih)) - cSi(n 1 D{h))\, 

(4.10) y{h) = sup yih 1 D(/l)) 

D(h-> 


•jTT ^ 

with a corresponding definition for h < . Thus y(h) is defined for all posi- 

tive A, This definition coincides with the previous definition whenever the latter 
IS applicable It is true that the supremum operation in (4 10) is limited to 
tests which depend only on the probability ratio, as (4 7) implies, but the argu¬ 
ment of Lemma 1 shows that this limitation does not diminish the supremum 

(It might appear that, for h = , y{h) is not uniquely defined. We shall 

shortly see that this is not the case.) 

The quantity y(h) depends, of course, on go and gi To put this in evidence, 
we shall also write y(h, go, gO. One can easily verify that 


liK go 



go gih \ 

go + gih ’ go + gi hj 


More generally, for any positive values h and h', we have y{h, go , gO = 
yQ^'i go , gi)j where go and gi are suitable functions of go, gi, h, and h'. Thus, if h 
IS not an admissible value of the probability ratio and h' is any admissible value, 
we can interpret the value of y{h, go , gO as the value of 7 coiresponding to V and 
some properly chosen a prion probabilities go and gi 


We now define A as the greatest lower bound of all points h > for which 


y{h) < 0 We define B as the least upper bound of all points h < 


Tfigi 

TFogo 

Wigi 


for which 


yih) <0 If 7 (/i) < 0 for all h the above definition implies A = B = ^ ° . 

iFigi 

The argument of Lemma 2 shows that yih) is monotomcally increasing in the 

interval (b, , and that yQi) is monotomcally decreasing in the inter- 

rrigi/ 


val 


IFogo 

IFigr 


We shall now define a sequential test S*{h) for every positive h The decision 



334 


A. WALD AND J WOLFOWITZ 


function of S*(Ji) will be ^p, and n = j ii and only if the j-th member 
of the sequence 

yQiri), yOiTi), yihrs),--- 


is the first element to be < 0. We see that 

(4.11) yih) = y(h I S*0i)) 

for all h. Incidentally, this proves that yQi) was uniquely defined at 
, _ l-Fogo 

Wigi' 

We shall now prove 

Lemma 3. The function yih) has the following ’properties 


a) It is continuous for all h. 


b) 7(A) = yiB) = 0 

c) 7 (A) < 0 for h > A or < B 


Only a) and c) require proof, since b) is a trivial consequence of a) and the 
definition of A and B. 


Let h be any point except 


Wo go 


, and let z be any point in a neighborhood of h. 


Within a neighborhood of h both Eo{n \ S*(z)) and Eiin \ 3*(z)) arc bounded. 
Let A be an arbitrarily given, positive number. Let h' and h" be any two points 
in a sufficiently small neighborhood of h, to be described shortly. We proceed 
as in the argument of Lemma 2 , with the present h' corresponding to h of Lemma 
2 , the present A" corresponding to h' of Lemma 2 , and with S*ih') corresponding 


to of Lemma 2 . Since — p -— and —~— • are continuous functions of z, 

go + gis 9o + giz 

and since Eoin \ S*iz)) and Ei(n | S*iz)) are bounded functions of z, we con¬ 
clude that, when the neighborhood of A is sufficiently small. 


jW) > yih') - A. 

Reversing the roles of A' and A" we obtain that in this neighborhood 

yih') > yih") - A, 

and conclude that 


\y{h')- 7 (h")\<A. 

Since A was arbitrary, this implies the continuity of yQi) everywhere, except 
Wo go 


perhaps at A = 


Wigi' 


To deal with the point A = , proceed as follows: Using the above argu* 

WI gi 

ment and the definition (4 9), (4.10), we prove that 7 (A) is continuous on the right 



SEQUENTIAL PROBABILITT BATIO TEST 


335 


^ “ w ~ ■ Using, at the point h = ,the definition of y[h I D{li)) for 

uifi'i Wigi 




(4.12) 


y{h\D{}i)) = — {~W,Eo(l - <p\D(h)) - cE,{n\D0i))} 

ffo i" gill 

+ ^ I1 

S'o -r 9i n 


(4.10) and (4.11), we prove that yQfC) is continuous on the left at /i = . 

Wigi 

This proves a). 

To prove c), we proceed as follows: Suppose for /lo > A we had y{K) = 0 
Since 

{ -WiE,ip I S*%)) - cErin I &*%))] < 0, 
we would have that 

{W,E,{<p I &*%)) - cEo(n I S*ih))} > 0. 


An argument like that of Lemma 2 would then show that y{h) >0 for 




W,g, 


h < ha. This, however, is impossible, because it is a violation of the definition 
of A 

In a similar way we prove that if h < B, yQi) < 0 This proves c) and with 
it the lemma. 


5. The behaidor of A and B. Lemma 4. Let g and c he fixed Then A and 
B are continuous funcHons of IFo and Wi 
Peooe: It Will be sufficient to prove that A is continuous, the proof for B 
being similar Suppose A > B. Let hi and ha be such that 

a) 5 < /ii < A < /i 2 ; 

b) /i 2 — Ai < A for an arbitrary positive A. 

We write y(h) temporarily as y{h, Wo, Wi) in order to e.xhibit the dependence 
on TFo and Wi . Then 

7(^1, IFo, Wi) > 0, 
y{h2, Wo, Wi) < 0. 

It follows from (4.9) that yQi lD(/i)) is continuous in TFo, TFi, uniformly in 

D{h). Hence yih, Wo, TFi) = sup yQi \ D(h)) is also continuous in TFo, TFi 

a(i) 

Hence, for ATFo and ATFi sufficiently small, 

yQii , TFo + ATFo, TFi + ATFi) > 0, 

7 (/i 2 , TFo + ATFo, TFi + ATFi) < 0. 



336 


A. WALD AND J WOLFOWITZ 


Therefore 

h < A{Wo + AFo, Wi + AFi) < h , 

which proves continuify, since A was arbitrary. 

If = A = 5, we take hi < < hi , h — h < A, and by a similar 

Wigi Wigi 

argument show that 

7(^1, Fo + AFo, Fi + AFi) < 0; 
y{h , Fo + AFo, Fi + AFi) < 0. 


Thus 

hi < B(Fo + A Fo, Fi + AFi) < A.(Fo + AFo > Fi + AFi) < Jh . 


This proves the lemma 

Lemma 5 Let g, c, and Fi be fixed. A is siridly monotomc in Fo. As Fo 
approaches 0, A approaches 0, as Fo approaches +«, A also approaches + «> 

Proof- Since A > , A -^ + oo as Fo . If Fo < c no reduc- 

Wigi 

tion m average risk could compensate for taking even a single observation, no 
matter what the value of h. Hence y{h) < 0 for all h when Fo < c, so that 

A = B. Since B < , 5 —> 0 as Fo —» 0. Hence A —> 0 as Fo 0. 

“ Fi gi 

It IS evident from (4 9) that yih ll>(/i)) is non-decreasing with increasing Fo 
(everything else fixed) PIcnee also 

y(h) = sup yih | Dih)), 

Dih) 


is non-decreasing with increasing Fo, for fixed h > 


Wo go 
Wigi 


and fixed Fi. For a 


positive A sufficiently small and for any h such that A < ft < A -j- A, we have 
that 


Eo i<p 1 S*ih)) > 0. 

Hence, for such ft, 7 (ft, Fo, Fi) is strictly monotonically increasing with increas¬ 
ing Fo Therefore A is (strictly) monotonically increasing with increasing Fo. 

We now define the function Fo(Fi , S) of the two positive arguments Fi, 
fi so that 


A(Fo(Fi, 5), Fi) = fi. 


By Lemma 5 such a function exists and is single-valued. 


6 . Properties of the function Fo(Fi, fi) Lemma 6, Fo(Fi, 5) is con¬ 
tinuous in Fi 
Proof: Let 

lim Fiff = Fi, 



SEQUENTIAL PROBABILITY RATIO TEST 


337 


and suppose that the sequence {Wa(Wiir, 5)} did not converge. Suppose Wa 
and TFo were two distinct limit points of this sequence. From the continuity 
of A (Lemma 4) it would follow that 

A{W', , Wi) = A{W'^ , TFi) 

This, however, violates Lemma 5. The only remaining possibility to be con¬ 
sidered IS that 

lira WaiWiN , S) = °° 


If that were the case, then, since A > , it would follow that A , 

WiQi ’ 

in violation of the fact that A s 5 
Lemma 7. We have, for fixed S, 


lim WoiWi) = 0, 

Tri-.Q 

lim = » 

TTi— 


Proof: If, for small Wi , TFoCWi) were bounded below by a positive number, 

then, since A > we could make A arbitrarily large by taking Wi 

sufficiently small, in violation of the fact that A = S To prove the second half 
of the lemma, assume that W^iWi) is bounded above as TFi ^ oo. Then 

b{ < ) Avill approach zero as TFi —»• oo Let h be fixed so that B < h < d. 

\ Wigi/ 


Consider the totality of points co for which there exists an integer n*(o)) such that: 


hvn* < B, 


B < hr, <b, j < n*. 

The conditional expected value of n'‘ in this totality, when Ho is true, may be 
made arbitrarily large by making B sufficiently small. Hence, when Wi is 
sufficiently large, for fixed but arbitrary h < S, the optimum procedure from the 
point of minimizing the average risk is to reject Ho at once without talcing any 
more observations This, however, contradicts the fact that h < S, and proves 
the lemma 

Lemma 8. We have, for fixed 5 > 1, 

lim BiWoiWi,d),Wi) = 6; 

iri-*o 

lim B{Wo{Wi,b),Wi) = 0. 

TTl—»oo 


lim WoiWi) = 0 


Proof: By Lemma 7, 



338 


A. WALD AND J. WOLFOWITZ 


When, for fixed c, both Wa and Wi are small enough, then, no matter what the 
value of h, y{h) < 0. Hence A = B, which proves the first half of the lemma. 

Let now [Win] be a sequence such that lun Win = “ Let 5 > 1 For the 
sake of brevity we write BiyVw) instead of 

BiWoiWiNS), Win). 

Suppose that, for sufficiently large N, B(Win) is bounded below by a positive 
number. Hence, for sufficiently large iV, the probability of rejecting Hi when 
it it is true is bounded below by a positive number. Moreover, since 

B < < A, it follows that, for N sufficiently large, is bounded above 

WiQi Wwgi 

and beloAv by positive constants Thus, for large N the average risk of the test 

defined by B(Win), 5, is greater than ugiWm, where u is a positive constant 

which does not depend on H Moreover, from the definition of H(TFi 7 ^), this 

risk is a minimum 

Let e be a positive number such that ^ ^ sufficiently 

large. Let Vi,Vi, with 0 < Hi < 1 < Fj, be two constants such that, for the 
sequential probability ratio test determined by them, both ao and ai are < e. 
Of course Enn and Ein are finite and determined by the test. For this test the 
average risk is leas than 

e(po TFow + {7i TFiat) + ego Eon -f cgi E\n 

< 2 + egoEon + cgiEin 

^ TTr 

< ^ ?i Win , 

for TFijr large enough. This however contradicts the fact that the minimum 
risk is > ugiWiN , and proves the lemma. 

7. Proof of the theorem. Let a given sequential probability ratio test So be 
defined by B*, A*; 5* < 1 < A*. Let a,(So) be the probability, according to 
So , of rejecting H, when it is true. Let c be fixed. 

By Lemma 4, B is a continuous function of Wo and Wi. Let S = A* in 
Lemma 8. Then there exists a pair Wo,W\, with Wo = lFc(Wi, A*), such that 

A(Fo, Fi) = A*, 

B{Wo, Wi) = B*. 

Hence the average risk 

r!7.[F<a.(S„)+cBS(n)], 

I 

corresponding to the sequential test So is a minimum. 



SEQUENTIAL PROBABILITY RATIO TEST 


339 


Now let Si be any other test for deciding between Ho and Hi and such that 
ai(iSi) < ai((SQ), and E\{n) exists {i = 1, 2). 

Then 

S S. [W^ c^.{So) + cElin)] < E fif. [W. aXSi) + cE\{n)]. 

t 4 

Since a^(,Sl) < ol^{SQ), we have 

Y.gMn) <T, 0 .E\in). 

\ t 

Now go , gi were arbitrarily chosen (subject, of course, to the obvious restric¬ 
tions). Hence it must be that 

Elin) < E\(n). 

This, however, is the desired result. 

REFERENCES 

1] A Wald, “Sequential tests of statistical hypotheses”, Annals of Math. Slat, Vol. 16 
(1946), pp 117-186 

[2] A, Wald, Sequential Analysis, John Wiley and Sons, Ino , New York, 1047 

[3] Charles Stein, "A note on cumulative sums”. Annals of Math. Stai., Vol 17 (1946), 

pp 498-499 



LIMITING DISTRIBUTION OF A ROOT OF A DETERMINANTAL 

EQUATION 

By D. N. Nanda 

Insiituie of Siaiistics, Umversiiy of North Carolina 

1. Summary. The exact distribution of a root of a determinantal equation 
when the roots are arranged in a monotonic order was obtained by S. N. Roy 
[3] in 1943 A different method for deriving the distiibution of any one of these 
roots has been described by the author in [2]. In the present paper the limit¬ 
ing forms of these distributions are obtained. This paper gives a method by 
which the limiting distributions can be obtamed without undergoing an inordi¬ 
nate amount of mathematical labor. 

2. Introduction. If a; = || rc,j || and x* = || .r*, || are two p-vaiiate sample 
matrices with ni and degrees of freedom and jS = [ | xx' \ \/ni and iS* = [ [ x*x*' H 
/ria are the covariance matrices which under the null hypothesis are independent 
estimates of the same population covariance matrix, then the joint distribution 
of the roots of the determinantal equation | A - 0(-A + -S) | = 0, where A ~ 
niS and B = rhS*, was obtained by Hsu [1] in 1939 and is 


(1) R'H 


n 11(1 - n (^. - e,), 

(0 < 0, < < . ■ < < i), 

where I = min (p, ni), m = | p — ni | + 1 and r = 712 — p + 1- The distribu¬ 
tion density may be expressed as 

(2) R{1, m, n) = cQ, m, n) H [®T(1 - 001 11 ' ^0, 

t_i ,<j 

where m = ixl2 — I and n = vl2 — 1. 

3. Method. Let 0, = L/w 111 (2). The joint distribution reduces to 

( 3 ) i+im+iu-iL n [f"(l “ U/nY] n (fi - rO dfi ■ ■ • du , 

fv lanl ^ ^ ? 

(0 < < rz-1 • • • < fi <n). 

340 




LIMITING DISTEIBUTION 


341 


As n tends to infinity the limit of (3) is 


(4) 


Kil,m) IlfT IKf." 

1=1 t<j 


(0 < f! < •■ < fi < °o). 


The value of K{1, m) is 

cG, m, n) 
lim i+im+Ul-l) 12 

n eo ^ 


in 


= lun 




1 +2m+ 2n+ 1+ 2 




jn 


nr 


1 + 2m + 2n + I + 2^ 


n r + - ^ + M r(x/2) 


lim 


2n + ^ + Ivi-h I C I —1) /2 

‘ 71 


nr 


By using Stirling’s approximation for gamma functions and after simplifica¬ 
tion we get 

‘ ''H- 2m + 2ri + f + 2^ 


lim 


nr(^ 


nr 

1=1 


2n + I + 1 


= 1 


Jm+Hl+l)l2 


Hence 


K{1, m) = — /, 

nrr 

1=1 \ 


+1^ 


2m -|- i + 1 


r(^/2) 


(5) 


and therefore 

7f (2, m) = 2'"+Vr(2m + 2), 

77(3, m) = 2""+V[r(m -fi l)r(2m + 3)], 
i7(4, m) = 2"’"+V[r(2m + 2)r(2m + 4)], 

77(5, m) = 2""‘+7[3r(m + l)r(2m + 3)r(2m + 5)]. 

Let 

(6) Gi,M) = Kil, m) f n 7T n n d 

<ri£* 1-1 i<! 

It can easily be observed that 

Oi.„,{x) = Pr (7i < a;) = lim Pr (9161 < a;) = lim Pr (0i < ^. 

n-*«» . \ ft'/ 



342 


P, N. NANDA 


Thus the limiting form of the distribution of the largest root can be obtained by 
integrating the density given in (4) according to the method described by the 
author in [2]. It is, however, observed that the mathematical labor is reduced 
considerably by adopting the following method. 

Referring to the results of the exact distriliution of the largest root given in 
[2], let Fi.m.ni'c) = (0, Z, i - 1, • • ■ , 1, x,m, n),thus F 2 ,m,„(a;) = (0,2,1, x, m, n) 
and Fz,m,n{x) ~ (0, 3, 2, 1, x\m, n). Then c(l, m, n)Fi,m,.^(x) is the probability 
that none of the roots 0, exceeds x, and is thus the cumulative distribution func¬ 
tion of the greatest root. We shall show that lim c(Z, m, n)Fi„,„{x/n) = 

n-^ao 

(?;,«(*). The reader is, however, asked to refer to [2] for the detailed explana¬ 
tion of the notations and certain mathematical operations used in this paper. 

4. Limiting distribution of the largest root. We will derive the distribution 
of the largest root for Z = 2 and 3 by the two methods, A straightforward 
method will be named A. A second method, which proves to be very simple 
and easy will be called Method B. 

(a) Z = 2 

(i) Methob a. We have, 

Pr (nfli < is) = Gi,M = Ki2,m) f dfi cZfz 

''o<r2<!’i<* 

By using the method described in [2],. we have 

GUx) = K{2, m) I f - f cZ^ dJ , 

= K{2, m)\rS'%y, l,x,m+l)~ Tr*(0,1, y, m -b 1)}, 

where 

T^'Vy) = f g{y)-y'^e~'> dy, 

Ja 

and 

(7)(a, l,b;m + 1) = f df = - Z.“V) + (m + l)(a, 1, b, m). 

Hence, 

GUx) = K(2, -f (m -f l)(y, 1, x; m) + 

- (m + 1)(0, 1,2/; m)], 

= K(2, ?n)rrn2y”“V‘' - x™+"e"*], 

as TT'^Ky, 1, x; m) - (0,1, y;m)] = 0. 



LIMITING distribution 


343 


Therefore 


lim Pr (7101 <x) = (? 2 ,„(a:) = K(2, m) 


2jf f y\--' . 


When a: - ~, Gi,Ux) = 1; hence K(2, m) = 2“’”+Vr(27?i + 2), the value given 
in (5). 

Now we shall derive the result by Method B. 

(ii) Method B. 

P 2 ,m,nC'r) = (0, 2, 1, a;; m, 7l) = -- 

m + n + 2 


• {2 [ - yr^^ dy - [ y-(l - ,)" dJ , 


a result given in [2]. 

Replacing x by x/n, we get 

(0, 2,1, x/n, m, n) = - . ^ „ 

m + 71 + 2 


I ^ 1/71 i ?/71 N 

■ Y I “ (a:/n)”'+'(l - jf y”(l - yY dy\ , 


also, letting y = u/n, we have 


(0, 2, 1, x/n, m, n) = 


{m-\-n-\- 2 )tY' 


2 [ - 71/71)'"+' du - ai^+'Cl - x/nr'-^ f" m“(1 - u/nT , 

•'0 Jr, 


lun Pr (7101 < a:) = Pr (0i < x/n) = lim c(2, m, n){0, 2,1, a;/?i; m, n), 


T{2m + 2) ( Jo 

which is the same as (8), obtained by Method A, 
(b) I = 3. 

(i) Method A. We have 


2 JJ du - ai^+'e-" f u”'e-‘ du\ , 


Pr(7i0i < x) = G3,m(x) = K(3, m) f 

Jo 

= m,m)[ (fi 

•10 <ra<f2<fi<i 


0 <ra<r2<i-i<2) 


nfrn(f. - dr. 


(fif2f3)’"e 


’w —(fi+fa+fa) 


{1, 2j 3} dti d^2 , 


where {1, 2, 3} = rif 2 (l, 2} + TafilS, 1} + f2r3{2, 3}, as given in [2], 



344 


D. N. NANDA 


Gs.mCa;) = Ki^, 1^) (j 


^o<fa<f2<fi<* •'o<ri<rs<rj<’: 


+ f [ 1,2) dri dd , 

•'ll <f2<:fi<f«<* J 

= 7!C(3,m)(rr%, 2,l,x]m + 1) 

+ n'\0, 2, y, 1. a:; m + 1) + n'%0, 2, l,y,m + 1)}, 

where 

(a, 2, l,b, m) = f (Tic^fi dtz. 

We have already obtained 

(0, 2, 1, a;; m) = Q^.„ix)/K{2, m) = [2 dy - Jj yV' d^ 


as given in (8). 

We also need the following results which are obtained by the method de¬ 
scribed for I = 2. 

(10) (a, 2,1, b; m) = 1 2 J dw - f u"‘e-“ dwj, 


(11) (a, 2, b, 1, c; ffi) = |b”+'e-‘ J dw - a”+'r“ vTe''^ du 

- c"+'e-° f dM 

»'o ^ 

Using these results we have 


dw 


U,,„(a:) = K(3,m)T7’^{2 du - (y”’+V'' + a;”+'0 f’ 

[ Jy Jv 


f* dw + r du + 2 f' 

Jo Jo Jo 


^2,n+ag-2« 


- dw . 


Simplifying we get 


(12) hm Pr(nei < a:) = (? 3 ,m(a:) = K(3, m)l2 T du f" u’"e"“ , 

n-^00 Jo Jo 

2 f .,W +1 —M J r Sm+Z — '2i7 7 m+2 — ® 

u e du I u e du — x e 

Jn Jn 


du - 


J u”^6 “ dw I, 


where Z(3, m) = 2 '™+V[r( 2 m + l)r( 2 m + 3)]. 



LIMITING DISTRIBUTION 


345 


(ii) Me'thod B. 

Fa.m.ni^) = (0, 3, 2, 1, t; m, n) 

"" ^ ^ 3 2m + 3, 2n + l)(0, 1, x\ m, n) 

— 2(0, 1, x; m + 1, n)(0,1, x; 2m + 2, 2n + 1) 

- (0, x; m + 2, n + 1)(0, 2, 1, x; m, n)], 

a result given in [2], 

Replacing x by x/n and putting u/n for the variate y of integration, we have, 
Fa.m.nix) = (0, 3, 2, 1, x/n; m, n) = -- ^ -- 


m + n + 3 


- -/nr " du [ u-a - u/nY du-J^ 

r® «i+2/h I \n‘|-l 

- u/nY du / - u/nY’'^^ du - ? - ^/n) 

•lo Jo n®'’‘+^(m + n + 2) 

[^2j[* w='^+‘(l - du - x^+'d - x/nY^^ fj u"‘(l - u/nY du |. 

Hence 

lim Pr iu£i < x) = lim Pr (^i < ^) = lim c(3, m, 71 )^ 5 , 

W—»flO Tl^ao \ th / n^eO 

= R:(3, m) (2 f" du f Ve-“ du - 2 [“ du f du 

{ Jq Jo Jo Jo 

- x“+'e-" 1^2 IJ du - Jj r’V“ dii |, 


m+2 —a? I 

— o; e 


where 


K(3, m) = 2'"‘+V[r(m + l)r(2m + 3)]. 

This result is the same as (12) obtained by Method A. 

We have thus shown that Method B is applicable for obtaining the limitmg 
forms of the distribution of the largest root and that it is much simpler as com¬ 
pared to the straightforward method called Method A here 
The limiting distributions for the largest root foi Z = 4 and 5 are listed below, 
(c) I = 4. 

lim Pr (n0i < x) = lim Pr = Gi^mix) 


= Kii, m) |2 IJ 

2w-h2 y 7n-]2 —X I w —u 

' u e du — X G I u e 
_ 0 

Jo 




lu - 2 f" R du 

K{2, m) Jo 

1 —2u ] . Gi,m+l(s/) _ ^,m+3^— 2 ; (?3,m(j')\ 

® A(2, m -b 1) K{3, m)j ’ 


du 



346 


D. N. NANDA 


where 


25 :( 4 , m) = 2‘™+V[r(2m + 2)r(2w + 4)]. 

(d) I = 5. 

lim Pr {fWi < a;) = lim Pr (e, <-) = G,,M) 

n-*oo \ nj 




;c(3 


^ _ 2 r 

I m) Jo 


.,2m+e —2« , 

u e du 


= K{5, m) -12 jf 

■ i - 2 jf V”+* e ■ 

• [ 1 I G'a.^Ca;) "I 


JO Jo 


where 


+ 2 / 

Jo 

-2 [ «'”+=■ e''“du /V+-^-“dM - 

Jo Jo 

■ F^i c"* f e““ du + (m + 2) , 

- 2 ~ T™+%- ^1^)]. 

/C(3, m + l)Jo jc(4, m)) ’ 


K(5, m) = 2‘'”+V[3r(OT + l)r(2?n + 3)r(2m + 5)]. 


6. Limiting distribution of the smaUest root. It was shown in [2] that the 
exact distribution of the smallest root can be obtained by using the relation 

Pr («z < x) = 1 - Pr (01 < 1 - X I V, m) 

This relation, however, does not help in obtaining the limitmg distribution of 
the smallest root from that of the largest root. The limiting distribution of the 
smallest root can be obtamed by the method illustrated below 
(a) 1 = 2 

The exact distribution of the smallest root 6^ can be expressed as 
Pr (02 <x)= c(2, m, n){{0, 2, 1, x; m, n) + (0, 2, x, 1, 2 ; m, n)}, 
where 0 = 1 . Replacing x by x/n, we get 

Pr (di < x/n) = c(2, m, n) {(0,2,1, x/n-, m, n) + (0, 2, x/n, l,z-,m,n)}, 
where 

(0, 2, 1, x/n, m, n) = dy 



LIMITING DISTRIBUTION 


347 


and 

(0, 2, xjn, 1, 2 ; m, n) = 2 + 1, ^ + 1) 

• j[ 2/“(l - yf d 2 / - (0, 2 ; m + 1, 71 + 1) j['V(l - VT dy], 

as obtained from (6) of [2]. 

The limiting distribution of is 

(13) Pr ((92 < x/n) = lim c(2, m, n) {(0, 2,1, a;/n; m, n) + (0,2, x/n, 1, 2 : m, n) ]. 

n—»oo 

Putting for y, the variate of integration and allowing n to tend to infinity, 
we have 

lim c(2, m, ?i)(0, 2, 1, x/n-, m, n) 

n-¥oo 

= K(2, m) |2 jf e-““ du - a;“+' e“" e~“ dwj, 

and 

lim c(2, m, n)(0, 2, x/n, 1, 2 , m, n) = /C(2, f e~“ dw 

n-*oo Jo 

= i!:(2, OT):c”+'e"*r(m + 1). 

Substituting these results in (13) we have 
lim Pr (nd 2 < x) = lim Pr (02 < x/n) 

n-^eo n—»co 

= K(2, m) |2 du - j[V 6““ du 

+ x“+^-^r(m + l)j, 

where 

(2, m) = 2’“"‘+V[r(2m + 2)]. 

(b) I = 3. 

The exact distribution of the smallest root can be expressed as 
Pr (03 < a;) = c(3, m, n)[(0, 3, 2,1, a:; m, n) + (0, 3, 2, x, 1, z, m, n) 

+ (0, 3, X , 2, 1, 2; m, n)], 

where 2 = 1. 

Replacing xhy x/n and allowing n to tend to infinity we have 
Pr {nds < x) = lim c(3, m, w)[(0, 3, 2, 1, x/n, m, n) ' 

n~*eo 

+ (0, 3, 2, x/n, 1, 3 , m, n) + (0, 3, x/n, 2,1, 2 ; m, «)] 


(14) 



348 


D. N. NANDA 


The values of these components on the right hand side of the above equation 
are given below. 

lim c(3, m, n)(0, 3, 2, 1, ai/n; m, n) = Gs,m(x), given by (12), 


lim c(3, m, r^)(0, 3, 2, x/n, 1, z, m, n) 




_»n+l —® 
— X G 


= 2f(3, m) IJV du ^2 jf 






du 


+ u”+’ e~“ du u" fi"“ dii 


-2 I u”'^*e"“du f u 

Jx Jo 

and 

lim c(3, m, ?i)(0, 3, x/n, 2,1, z; m, n) = 7f(3, m)< f u™ e~'‘ du [2 f 

fi—^ae [Jo L 


u^”+^-^“du 


2mH-3 —2u j 

u e du 


r® T 
j u c du 


du 


- /'V+'e"“dul - a;"+'e~*r2 fe'"”du - x'"+'e"" / Vc 

+ fV+‘r“du ru’^e-'^du - 2 ru”'*\-^du fV‘+%-““duV 

•0 J® vO v ® J 

Substituting in (14) we have, 

lim Pr (u ^3 < x) = (2'”‘+V[r(m + l)r(2m + 3)]) 

n-too 

•(2 f e-'“dM fVe-“du + 2 /’V"+'e"'“du ru“e"“du 
Jo Jo Jo J* 

- 2 [ u’'‘+'e"’‘du /’V”‘+'e~"“du - 2 fir+^e-^du ru''"+'e“'“ d^ 
Jo Jo Jo J» 

n_nt+2 —a: f 2m+l “2 m j O.v.^+2 ~x f „, 2 tn+l —2m 7 

— .6X 6 u e du — 2x e u e du 

Jo Jo 

+ 2mH'3 —2® / f T». —M 7 I r r?i —u 

X e ljuedu + jue 


2mH'3 —2® / / T». —M 




lim Pr (jidi < x) = 2“’'‘+V[r(w + l)r(2m + 3)] 


u e du -h u e du 


jT (2 m -f- 4) I" m ^~u J.. I n f . 2tn+l -2ii , f m , 

'1— 22777 + 4 —' Jq ^ ^ au + 2 j u e du j u e du 

- 2r(m + 2) fV'"'^-'“du - 2 fV+'e““ du 
Jo Jo J a 

- jfV^+'e-'" du + r( 2 u + l)x’ 

+ jVe-“ duj. 


,2ff4+3 g~2® 



LIMITING DISTRIBUTION 


349 


Thus we have seen that this method can be used for obtaining the limiting 
distribution of the smallest root for any value of I 


6 . Limiting distribution of any mtennediate root. The above method can 
also be used for obtaining the limiting distribution of any intermediate root. 
We shall give the distribution of 0^ for I = 3 We have 


(16) Pr( 6 l 2 <a:) = c(3, m,n){(0,3,2, + (0, 3, 2, 1, z, m, a)), 

where 3 = 1 . 

The lim,i_„ c(3, m, n)(0, 3, 2 , 1 , v/n, m, n) and lim„_„ c(3, m, n)(0, 3, 2 , %/n, 
1 , 2 ; m, n) are given by ( 12 ) and (15) respectively. Substituting these results 
in (16) and simplifying we get 


lim Pr (n 02 < a;) = 

n->« 


r(m + l)r(2m + 3) 



du 


Jo 


—2u 

e 


du — 2 



du [ e du 
Jo 




Jo Jo 

r du r du - r du f u”‘^'e 

J® Jq Jjt Jo 


m41 —w 



Oi, 

22m+3 f jiX 

i™ Pl'Ma < a;) = r(m + l)r(2m + 3) i 

- 2r(m + 2) r da - 4r:“+^e~* du + 2x'”+^e-“" 

Jo Jo 

■ r du +r r du r u’u-'^ du 

Jo LJa? Jo 


J u’^ e du j u"'^^ e “ da |. 


Thus the limiting distribution of any intermediate root can be obtained by the 
above method 


7. Further problems. The limiting distiibution of the largest root is found 
to be very helpful in obtaining the distribution of the sum of roots when m = 0 . 
This condition implies that when the results are applied to canonical correlations 
the numbers of variates in the two sets differ by unity. The distributions for 
the sum of roots have been derived under the above condition for I = 2, S and 4 
and the results are being presented in the next paper of this series 

8 . Acknowledgements. I am extremely thankful to Drs. P. L. Hsu and 
Plarold Hotelling for guidance and help in this research 



350 


D. N. NANDA 


REFERENCES 

[1] P L Hstr, “On the distribution of loots of certain determinantal equations,” Annals of 

Eiigemcs, Vol. 3 (1939), pp 250-258. 

[2] D N Nanda, "Distribution of a root of a determinantal equation,” Annals of Math 

Stat., Vol 18 (1947) 

[3] S. N. Roy, “The individual sampling distribution of the maximum, the minimum and 

any intermediate of the 'p' statistics on the null hypothesis,” Sankhyd, Vol, 8 
(1943). 



ON A SOURCE OF DOWNWARD BIAS IN THE ANALYSIS OF VARIANCE 

AND COVARIANCE 

By William G. Madow 

InshLuLe of Siaiistics, University of North Carolina 

1. Summary. It is shown that if, in the analysis of variance, the experiments 
are not in a state of statistical control due to variations m the true means, then 
the test will have a downward bias. The power function of the analysis of var¬ 
iance test is obtained when this downward bias is present. 

2. Introduction. To introduce the discussion of this bias let us consider the 
generalized Student’s hypothesis. 

Let yi, ■ ■ ■ , VkN he normally and independently distributed with variance 
0 -^ and let the expected value of j/,,, be a,J Then the generalized Student’s 
hypothesis is 

(Null hypothesis) a., = a 

and the class of alternative hypotheses against which the null hypothesis is 
tested is 

(Class A) tti, = a ,. 

From the statement of the null hypothesis and the alternatives of Class A it 
follows that both the null hypothesis and the alternatives of Class A require that 

(1.1) n«i = • • • — a^ir . 

Since our experiments are rarely m such perfect statistical control that (1.1) 
holds whether or not the null hypothesis is true, it becomes reasonable to in¬ 
vestigate the existing F test when instead of the alternatives to the null hypoth¬ 
esis being of Class A, they are simply Class B; 

(Class B) Equation (1.1) is false for at least one value of i 
Furthermore, for many practical purposes we would prefer to test the average 
null hypothesis: 

(Average null hypothesis) Oi = a, 

where Ad, = a,i -H • • • + a.Ar and ka = ai + + d*,, instead of the null 

hypothesis, the alternatives to the average null hypothesis being of Class C. 
(Class C) The can have any values such that not all the d, equal d. 

Throughout thia paper the letter i will assume all integral values from 1 to k, the letters 
II, u will assume all integral values from 1 to W, the letters 7, will assume all integral values 
from 1 to m, the letter a will assume all integral values from ni + -t- Jiy-i -j- 1 to 

Wi -h ■ ■ -h % , (no = 0), and ai, «2 will assume all integral values from 0 to « 

351 



352 


WILLIAM G. MADOW 


The F-test of the null hypothesis against the alternatives of Class A is, as is 
well known, 

kiN - 1 ) Ziy^ - yT 
{k — lY, ivi" - y^Y) 


where Ny^ = y,i + + Hxh and ky = yi-\- ’ • + y*. To answer the ques¬ 

tions formulated above concerning the F-test when the average null hypothesis 
or the alternatives of classes B or C are true, we must then calculate the dis¬ 
tribution of F under these various conditions This is done in Section 3, 

A somewhat informal means of obtaining the conclusions is that of studying 
F itself. Taking the expected values of the numerator and denominator of F 
and defining 

A I] (a. - af 
4>\ = _^_ 

(/c - 1)<7^ 

“ k{N - \y 5 ~ 

we obtain as the ratio of the two expected values 

p _ 1 

1 + <#>2 

It is well known that, in general, the larger the value of A the more closely will F 
approximate F. From this fact it is easy to see why if the null hypothesis is true, 
then F ~ 1 , whereas if the null hypothesis is false but an alternative of Class A 
is true then 

F ~ I-h > 1 


so that large values of F become more likely than if the null hypothesis were true. 
However, if an alternative of Class B is true then 




l + 4>l 

1+^2 


80 that if (jn < 02 , smaller values of F occur more frequently than indicated 
by the null hypothesis. Thus we would tend to accept the null hypothesis more 
frequently than desired when it is false. Even when the null hypothesis is false 
so that 01 > 0, the values of F will tend to be less if 02 > 0 than if 02 = 0 
whether or not 0 ^ < 0t Not only is the probability of an error of the first kind 
less than the value e we may have previously selected, but also the power of the 
test is less than would be indicated by Tang’s tables [1]. The lack of statisti¬ 
cal control represented by variation of expected values within a class has the 
effect of making it less likely than the standard F-test indicates that the null 



DOWNWARD BIAS 


353 


hypothesis will be rejected whether it be true or false. Furthermore, even for 
relatively low values of <^ 2 , the reductions in the probabilities of rejection may 
be over 40 per cent as indicated by some examples given belmv. 

If the average null hypothesis is true but (1 1) is false it follows that 




1 

1 + <^2’ 


so that the full effect of the doimward bias occurs in that case. Thus in cases 
where statistical control is lacking, to test the average null hypothesis by the F- 
test may well result in accepting the hypothesis when it is false. If the null 
hypothesis is rejected, however, then we can expect that the differences among 
the true means are even larger than indicated by Tang’s tables. 

To illustrate, it is shown in Section 4 that if Ic = 5 and N = 7, then the prob¬ 
ability of rejecting the average null hypothesis when it is true, but ( 1 . 1 ) is false 
will not be the preassigned .05 but something less than .03 if 02 > -05. Fur¬ 
thermore, if 02 > 07, then the power of the F tests for this example will be re¬ 
duced by at least 40 per cent whatever the value of 0?. 

The conclusions reached above remain valid for the analysis of variance and 
covariance in general. In the general case however, the value of the average 
null hypothesis in simplifying the analysis may be considerably reduced since 
the parameter 0 ? no longer vanishes when the average null hypothesis is true 
For example, if Ey, = , and if the average null hypothesis la p = 0, where 

Np = pi + ■ + py, then upon calculating 

(E p^x^.y 

,2 _ 

01 — 2 ^2 

(T 

V 

we see that 4>l will not vanish in general if p vanishes 

Although as shown above the average null hypothesis may not have too great 
importance in the case of regression, yet if the “variance between treatments” 
is a function of arithmetic means of the random variables as in the “pure” 
analysis of variance the average null hypothesis may well be very useful Simple 
examples of this are provided by the randomized block, Latin square, and similar 
designs. 

The distributions that we shall need are given in Section 3. The inequalities 
on the basis of which the bias is demonstrated are obtained in Section 4. 

It would be highly desirable to have Tang’s tables extended so that they might 
provide the answers to the questions raised by this source of bias In the ab¬ 
sence of such extensions the inequalities of Section 4 may give some rough 
idea, but these inequalities are not sharp enough. 


3. The calculation of the distributions. The following theorem was proved, 
although not explicitly stated, as part of an earlier note [2] (Note the change 
from a:, to y, as the notation for the random variable.) 



354 


WILLIAM <3. MADOW 


Theohem 1. Let yi, 
variance and means ai, 


, yn of ranks ni, 


, y It he normally and independently distributed with 
, an and letqi, • • , qm be quadratic forms 

q-t = otlVyy,yv 

• ■ , Wm . Then, if an orthogonal transformation 

Vv ~ 52 


exists such that 


= 52 3a , 


it follows that the random variables qy/a^ are independently distributed in dis¬ 
tributions with degrees of freedom ni, •• ,nm and parameters , • ■ • , Xm , where 

\ ~ y - 

~ c\ 2 ^uv ^11 "Tr* • 

ii,Y Z<x^ 2 

Various conditions for the existence of an orthogonal transformation satisfy¬ 
ing (2.1) of Theorem 1 have been given Among these are: 

1 . Cochran’s [3] condition. If 52 ?r = 52 2 /? then a necessary and sufficient 

y p 

condition for the existence of an orthogonal transformation satisfying (2 1 ) 

is 52^7 = N. 

1 

2 Craig’s [4] condition, If Ay denotes the matrix (aj’’'^) then a necessary 
and sufficient condition for the existence of an orthogonal transformation satis¬ 
fying ( 2 . 1 ) is = 5.„A.y where dy,, is the null matrix if 7 5 ^ i; and the identity 
matrix if 7 = 

3. Linear Hypothesis condition, (Kolodziejczyk [5]) If X be the likelihood 
ratio teat of a linear hypothesis and if = 1 — X°^^, then E'‘ = 31/(21 4 - 32 ) 
and an orthogonal transformation exists satisfying ( 2 . 1 ) with m = 2. 

To summarize some results obtained by Tang [1], let us state 
Theoebm 2. If xi^ and ore independently distributed in distributions with 
ni and degrees of freedom and parameters Xi and X 2 , and if 


n , i2> 

Xi + X2 


then the probability density of is 

p = p{E" 1 Xi, X 2 , ni, 712 ) = 


\ (»1 \ 0:2 P I Til -f- 7I2 I I 

Xi Xs i I-2-r cei -j- a2 


ai ! eta ' r I 


+ ai ) r ( ^-h Q!2 


(E^ii - Ey 



DOWNWARD BIAS 


355 


By assigning certain values to \i and Xj we obtain the following special cases 
of (2 2) 

Pi = piB" I Xi, 0, , na) = 


(2 3 ) 


P2 = p{E'^ 10, Xs, ?ii, m) 


(2.4) 


"“■'Kt+ “■)<!) 






•S 


x?.r(?ii3 + ^) 
^,r(|)r(| + ») 


(1 - 


( 2 . 5 ) Pa = p(E^\ 0 , 0 , ni, na) 



(^2)(ni/2)-l(i _ ^2^(712/2)-! 


It is noted that (2.3) is Tang’s distribution (112) upon which the calculations 
of his tables were based. To see this we need only make the correspondence 


This paper 

Tang 

Xi 

X 

ni, ?i2 

Si,h 


i 


We define e to be the probability of an error of the first kind. Tang obtained 
the critical values E] of by requiring that 



= « say .01 or .05. 


Then he calculated 

Pii — [ Pi{E^ I Xi) 0, rii, na) dE^ 

Jo 

using the values of E\ obtained above. Hence 1 — Pn is the power of the test. 
If, however, Xi = 0 but X? 0, then to find 



356 


\ylLLIAM 6. MADOW 


we could make the transformation = 1 — and find 
Pm = v(Sf 10, X 2 , ni, m) dG\ 

Jo 

It is easy to verify that 

p(G‘ 1 0, X 2 , ni, nj) = pi(E^I X 2 ,0, 112 , ni) 

if we put G in place of in the latter density. It follows that to calculate Pm 
it would be sufficient to have full tables of Tang’s distribution since 

Pm = / Pi^E 1X 2 ,0, ?i 2 , Wi) dEj^, 

Jo 

Tang’s tables are not however sufficiently extensive. Furthermore, tables of 
(2 2) are also necessary. As yet these tables do not exist. However, some useful 
conclusions can be diawn from the inequalities obtained in the following section. 

First, however, let us evaluate rji, 112 , Xi and X 2 for the generalized Student’s 
hypothesis discussed in the introduction. It is easy to see that Ui = h ~ 1 
and rh = h(N — 1). To evaluate Xi and X 2 we note from Theorem 1 that we 
only need substitute Hj/vj for in qi and ?2 where 

qi ^ NYiiyi - yf 
% 

\,v 

Upon making these substitutions we obtain 

~ 

X 2 = 2 («!>' ~ 

^(T i.v 

Thus the various hypotheses concerning the a„- mfluence the distribution of F 
or E^ = 1/(1 4 - Fni/m) by affecting the values of Xi and X 2 . 


4. Limits of the values of p. It follows readily from (2,2) that, 

p / ni + n 2 \ 

p _ V 2 / ^^2^(711/2)— ^ 2 yn 2 / 2 )—1 


(3.1) 


\ 2 y V 2 / 




Xf‘X?^ 
ati^2 ai! “2! 


(£?")“’(! - ET^G, 




^(ni + 7h ^ I \ T, ('>^i\/'i^\ 

r(l-+»)r(| + »)r(=l±l»y 


where 



DOWNWARD BIAS 


357 


Now if o > 0,5 > 0, and j is an integer > 1, we have 






^+6 + 20-1)) < + ■ 


Hence, it follows that 


1 < < 


+ ^ 21 ”* /ri + 112 + 2q:i\““ 


Substituting we see that 
Po c c 


< p < poe 


A ip 2 /’ni + n 2 \ f2X2(1 - H^)”l 

■“pr® —s—J+ 

and 

(3.3) ^ ^ ^ _ ^2) 
Let 2n,<^>* = X,, f = 1, 2. 


+ X2(1 - 


i2\ ( fii -\- n2 


Theorem 3 Let « 


. [' 

Je‘ 


po dE'^ so that e is the probability of an error of the first 


kind. Then, for all values of <j)i 


e > P2 dE^ 


and if E'‘ > ni/(ni + nf), it follows that 


> i exp{ — 2 n 24 >l + 2<^2(1 — E\){ni + 112)) > / . P2 dE'^ > ee . 

JBt 


Furthermore, for all values of<jjl 


(3.7) ‘ 


f PidB" > fpdE\ 

Je] Jb- 

and if > (ni + 2 )/(?ii + na), it follows that 

f pidEi > exp{ — 2 n 2<#>2 + 2 ^ 2(1 — E]){ni + nf) 2<j>l} f pi 1 

JeI •‘St 


> ,pdE > e 


,2 .. -2^2*1 


I r^dE’. 


Finally, ify can assume the two values 0 and 2, it follows that if 
.2. -10g«_ 


then 1/7 = 0 , 


> 2mni + m) - (% + 7 )) ^ 


P 2 dE < €8 



368 


WILLIAM G. MADOW 


and if y = 2 
(3.10) 


f] V dE^ < S p, dE\ 

JbZ 


Phoop. To prove (3.4) and (3.6) it is only necessary to follow Daly’s [6] 
procedure.^ Since 

exp{—2n2<i>i + 2i^3(l — E^)(7h + itf) + 702} 
and 


exp {— 

are decreasing functions of E'^, and 

exp {— 2n502 + 2 ^ 2(1 — E^)(n.i + nf) + 702 } 


if 


< 1 


^ 111 + 7 
Hi “ 1 “ 7I2 

the inequalities (3.5) and (3.7) follow immediately from (3.2) and (3.3). Finally 

exp {— 2?Z202 + 202(1 — E^)(ni + nf) + 702! < fi < 1 

if (3.8) IS true, so that (3 9) and (3.10) follow, 

From (3 8 ), (3.9) and (3,10) wc can calculate either a lower limit for the bias, 
if we linow 02 , or the upper limit that 02 can have if we wish the bias to be not 
greater than some given amount. Thus these limits do not answer the important 
question of what is a value 02 such that if 02 < 0 then the bias is less than (1 - 
S)e. They only provide a value 0 ' of 02 such that if 02 > 0 ' then the bias is at 
least (1 — 5)e. 

If, for example, 5=5 and ?ii = 1 as in the case of Students’ ratio, we have 
if 7 = 0 


02 > 


693 

2{n2E^ - 1 ) 


and if e = 05, then decreases steadily from .903 if nz = 2, to .063 if riz = 60’ 
and the corresponding lower limits of 02 decrease from .43 to .12. Thus, if 
02 > 43 or .12 in these two cases, it follows that the probability of rejectmg the 
average null hypothesis will be not 05 but something less than .025, 

If 5 = .6 and ni = 4, nz = 30 then we can evaluate the lower limit of 02 for 
the example given in the introduction finding. 


^ 2(.279)(34) - 8 


implies a downward bias of at least 40 per cent of ,05, Also, if (j>l > .07 then for 


“ The procedure followed is given in [6] on pp. 4, 6, equations (2.2) through Lemma 1. 



DOWNWARD BIAS 


any value of </>! the power of the analysis of variance test is reduced at least 40 
percent, 

6. Conclusions. The rather sharp effects of a moderate lack of statistical 
control on the probabilities associated with the F-test indicates the importance 
of testing for statistical control outside of the industrial applications now made. 
Furthermore, it would seem advisable to investigate tests and designs that are 
less sensitive to the lack of control than is the F-test 

REFERENCES 

[1] P C Tang, "The power function of the analysis of variance tests with tables and illus¬ 

trations of their use," Siahskd Emrch Memom, Vol 2 (1938), pp, 126-157, 

[2] William G Madow, “The distribution of quadratic forms in non-central normal lan- 

domvariables,”iwMl5o/Iaflim iStoiJol 9 (1940),pp 100-104. 

[31 W G, Cochran, "The distribution of quadiatic forms in a normal system, with applica¬ 
tions to the analysis of covariance,” Camiridge Phil Soc Proc Vol. 30 (1934), 
pp 178-191 

[4] A. T. Craig, "Note on the independence of certain quadiatic forms," Annals of Math 

M,Vol 14 (1943), pp 195-197 

[5] S. Kolodziejczyk, "On the important class of statistical hypotheses," Biomtnk, 

Vol 27 (1935), pp 161-190. 

[6] J, F. Daly, "On the unbiased character of likelihood-ratio tests for independence m 

normal systems,” Annals of Math Slot, Vol 11 (1940), pp 1-33. The proce¬ 
dure followed IS given on pp 4,5, equations (2 2) through Lemma 1 



MIXTURE OF DISTRIBUTIONS 
By Herbisht Robbins 

Department of Mathematical SlaListics, University of North Carolina 

1, Summary. Mixtures of measures or disfciibuLions occur frequently in the 
theory and applications of probability and statistics. In the simplest case it 
may, for example, be reasonable to assume that one is dealing with the nnixture 
in given proportions of a finite number of normal populations with different 
moans or variances. The mi-xture parameter may also be denuraerably infinite, 
as in the theory of sums of a random number of random variables, or continuous, 
as in the compound Poisson distribution. 

The operation of LebesguC'Stieltjes integration, j f(x) dfi, is linear with 

respect to both integrand fix) and measure p. The first type of linearity has as 
its continuous analog the theorem of Fubini on interchange of order of integra¬ 
tion; the second type of linearity has a corresponding continuous analog which 
IS of importance whenever one deals with mixtures of measures or distributions, 
and which forms the subject of the present paper Other treatments of the 
same subject have been given ([1], [2]; see also [3], [d]) but it is hoped that the 
discussion given here will be useful to the mathematical statistician. 

A general measure theoretic form of the fundamental theorem is given in 
Section 2, and in Section 3 the theorem is formulated in terms of finite dimen¬ 
sional spaces and distribution functions. The operation of convolution as an 
example of mixture is treated briefly in Section 4, while Section 5 is devoted to. 
random sampling from a mixed population. 

We shall refer to Theory of the Integral by S Saks (second edition, Warszawa, 
1937) as [tS], and the Mathematical Methods of Statistics by H Cramer (Prince¬ 
ton, 1946) as [O']. 

2. Mixture of measures in general. Let X{Y) be a space with points x{y) 
and let ^(S)) be a ff-field of subsets of X{Y) Let r be a measure on g). Let 
be f or a. c (,v)y a, measure on T, such that fiyiS) is for every ;S in S a measurable 
(§)) function of y . Define for every S in % 

(1) p.{S) = f py{S) dv. 

Jy 

Theorem 1. im is a measure on If v(Y) => tiyiN) == 1, then = 1. 
Proof. Clear. 

Theorem 2 If f(x) is any non-negative or nan-positive function measurable 
L3f) then the function 

giy) = I /(«) 

360 


( 2 ) 



ifS measurable (?)), and 

(3) 


MIXTURE OP DISTRIBUTIONS 


361 


/ f(x) dfi = I g(y) civ. 

Jy 

Proof. First let Jo(x) be any non-negative simple function [5, p. 7] of the 
form 


/o(-c) — {fti, fSi ; ■ • • ] ak, Sk] 

where the S , are disj oint sets in K such that X — Si and the Ui are non-negative 

constants Then 


(S) Qoiy) = f fo{x) dfjty = 

Jx 1 

is a non-negative function measurable (D), and from (1) it follows that each side 
of (3) is equal to a,ii(St) Hence the theorem holds in this case. 

Next let fix) be any non-negative function measurable (S); then [S, p. 14] 
there exists a sequence /«(.r) of simple functions such that for every a:, 

(3) 0 </iW < fiix) < ••• ; lim/n(a;) =fix). 

n—*oo 

Setting 

(7) gnil/) = [ fnix) dfiy, giy) = [ fix) , 

Jx Jx 


it follows from the theorem of monotone convergence [»S, p. 28] and from the 
preceding paragraph that 


(8) / fix) dn = lim / fnix) dy. = lim / gniy) dv, 

(9) giy) = lim / /„(a;) dyy = hm gniy). 

n—KJO *>X n— 

Prom (6) and (9) it follows that for a.e. iv)y, 

(10) 0 < giiy) < giiy) < • • • , lim gniy) = giy). 

rt-»so 

Hence giy) is measurable (2)), and from the theorem of monotone convergence, 

(11) f giy) dv = lim f gniy) dv. 

Equation (3) now follows from (8) and (11). 

By passing from fix) to —fix) we establish (3) when fix) is any non-positive 
function measurable (S), This completes the proof of Theorem 2 
If fix) is an arbitrary function measurable (9f) we define 


(12) f-^ix) = 


' fix) a fix) > 0 
I 0 otherwise 


rix) 


fix) if fix) < 0 
0 otherwise 



362 


HEHBERT BOBBINS 


SO that 

( 13 ) fix) = fix)+n^) 

is the sum of two functions measurable (9f) of constant sign. By Theorem 2 the 
functions 

(14) (jiitj) = f fix) dfi,, gM = [ dny 
are measurable (§)) and 

(15) 0 < J^fi^) = £ sM dv < «>, 

(16) 0 > lj~ix)dix j^0iiy)dv > -«>. 

The integral f fix) d^l exists if and only if at least one of the two quantities (15) 


and (16) is finite [S, p 20]. 

Theorem 3. A necessary and sufficient condition that 



is that at least one of the two quantities (15) and (16) be finite. 

Proof. By the remark preceding Theorem 3 the condition is clearly necessary. 
Now suppose, e.g., that (15) is ftnite;AVc must show that (17) holds. By hypoth¬ 
esis, 

(18) j f^ix) dfi < <xi, j fix) dn = l^fi^) 

From (18) and (15) it follows that 0 < gfiy) < » for a.e. iv)y ; hence 

(19) J fix) dfiy = J /■*'(x) dyg + f~ix) dju„ = gfiy) + g^iy) 
exists for a.e. iv)y. From the finiteness of (15) it follows that 

( 20 ) [ igiiy) + giiy)) dr = [ gfiy) dr + j g^iy) dr 

Jy Jy Jy 

exists. Hence from (19), the integral 

( 21 ) I £ fix) dfi^J dr = igiiy) + giiv)) dr 

exists. Equation (17) now follows from ( 21 ), (20), (15), and (18). This com¬ 
pletes the proof of Theorem 3 

Corollary 1. If m(X) < oo , and if fix) is hounded from above or from below, 
then both sides of (17) exist and the equality holds. 



MIXTURE OF DISTRIBUTIONS 


363 


Proof. If,say,/(a;) <G < ai,theii 

0< f /+(t) d^<C- dX) < CO, 

and the result follows from Theorem 3 

We shall now show by an example that the existence and even the finiteness of 
the right side of (17) does not imply the existence of the left side. 

Let X = Y = (1, 2, • ■ ■ , n, • • •) and let consist of all subsets of Z(F). 
Let V be the measure which assigns mass Cn to n, where the c„ are positive con¬ 
stants such that 2“ c„ = 1. Let fi„ assign the mass l/2i!. to each of the points 
1, 2, • ■ , 2n. Let fix) be such that/(I) = lii,/(2) = -bi,/(3) = &2,/(4) 
= — ?) 2 , • • • where the are positive constants. Then 

[ fix) = 0 (?i = 1, 2, • ), 

XX 

so that 

/r { = 0- 

The measure g defined by (1) assigns to each n a positive value ju(n) given by 

g(l) = g(2) = Ci.(2)-‘ + C2-(2-2)-' + C3-(2-3)-' + ••■ 

g(3) = g(4) = C2-(2-2)-^ + ca-(2-3)~' fi- • • • 


where g(X) = '^dn) = IZcn = 1. 

1 1 

Noav fix the and c„ in such a way that 

11i'M(1) + ^2’M(3) &3-m( 5) -f • • ■ = CO. 


Then 


j^f^ix) dn = -j^f (. 1 .) dfi = 

so that the left side of (17) does not exist, even though viY) = g„(Z) = fi(Z) = 
1 and the right side of (17) exists and is equal to zero. 


3. A restatement of the preceding results in the form most useful in prob¬ 
ability theory. Let x = (xi, ■ ■ • , a:„) be a point in the n-dimeiisional Euclidean 
space Rn , and let denote the ff-field of Borel sets in Rn . Let Sx denote the 
half-open interval in Rn consisting of all pomts (loi, • • • , wf) m Rn satisfying the 
inequalities 

(22) Wi < Xi, •• , w„ < x„ ; 
then if n is any probability measure on Bn the function 

(23) Fix) = dSf) 



364 


IIERBKBT IlOBBINS 


IS the distribution function corresponding to p Conversely, if Fix) is any dis¬ 
tribution function in Rn [C, p. 80] there is a unique probability measure n on 
such that (23] holds. As a matter of notation we write for any Borel measurable 
/(a;), 

(24) [ fix) dfi = f fix) dFix) 

provided the integral on the left exists. 

Now let y = iui, • , i/m) be a point in Rm, let Giy) be a distribution function, 
and let v denote the corresponding probability measure on . Let Fix,y) 
be for a e ip)y a distribution function in x, and for every ai a Borel measurable 
function of y, and let be the corresponding probability measure on B„ . 
Theorem 4 The function 

(25) Hix) = r Fix,y)dGiy) 


IS a distribution function inRn • Let u denote the corresponding prohdbiliiy measure 
on Bn . Then for any S in Bn , fi„(iS) is a Borel measurable function of y and 

(26) uiS) = [ PviS) dOiy). 


Proof. Let C denote the class of all Borel sets S in R„ such that ui/iS) is a 
Borel measurable function of y. We shall show that C is anormal class [S, p 83]. 

CO 

(i) If Si, Sa, • ■ • is a sequence of disjoint sets in C and if ;S = '^S„ , then 


PyiS) = nv 


( CO \ eo 




is a convergent series of Borel measurable functions and is therefore itself a Borel 
measurable function, 

as 

(n) If Si 3 Sj 3 • • IS a decreasing sequence of sets in C and if S = JlSn, 
then 


UviB) — fly 



lim fiyiSn) 

n->« 


is the limit of a sequence of Borel measurable functions and is therefore a Borel 
measurable function. 

Hence C is a normal class But C contains every interval Sx , for y^iSx) = 
Fix, y) was assumed to be a Borel measurable function of y for every x, It 
follows [S, p 85] that C = 

It now follows from Theorem 1 that the set function niS) defined by (26) 
is a probability measure on . The corresponding distribution function is the 
function Hix) defined by (25). Thus Theorem 4 is proved 



MIXTURE OP DISTRIBUTIONS 


365 


Let/(a:) = /^(t) + /~(a:) be any Borel measurable function. Then from Theo¬ 
rem 2, the integrals 


(27) 


(28) 


f /■^(x)dH(x)= [ /nx}dj r Fix, y)dG{y) 

•I—CO J—50 I J—00 

f f~ix) dHix) = f f~(x) dJ I Fix, y) dOiy) 

J— a) J— 00 1 BO 

A CO A 00 

= / { f~i3:) dx Fix, y) }> dOiy) 

J—ta [ J—00 


exist. The following theorem is an immediate consequence of Theorem 3 and 
Corollary 1 

Theorem 5. A. necessary and sufficient condition that 
(29) j /(.r) Fix,y)dGiy)"j=J fix) d^Fix, y)'j dGiy) 


is that the left side of (29) exist; i e that at least one of the quantities (27) and (28) 
he finite. This will he true in particular if fix) is bounded from above or from below 


4. The operation of convolution. An example of the general mixture (25) 
of distribution functions is the operation of convolution: if Fix), Gix) are two 
distribution functions in Ri then Fix, y) = Fix — y) satisfies the conditions of 
Theorem 4, so that 


(30) 



Fix - y) dGiy) 


is also a distribution function in Ri, denoted by 

(31) Hix) = Fix) * Gix). 

Corresponding to any distribution function Fix) in Ri is the characteristic 
function 


(32) 


<pit) 



which m turn uniquely determines Fix) [C, p. 93J. 

Theorem 6. Let Fix), Gix), Hix) he distribution functions in Ri and let ifiiit), 
viit), yit) be the corresponding characteristic functions. Then 

(33) Hix) = Fix) * Gix) 
if and only if 

(34) 


ipit) = <Plit)-<Psii)- 



366 


HERBERT ROBBINS 


Proof. Assume (33) holds. Since le“‘'l < 1 we have from Theorem 5, 


= /” 6’'“ |j[] dMx - y) I dOiy) 

= f e'‘<' /£ e*'" dF(w) I dG(y) = f>i(t) ■ <p,it). 


The converse implication now follows from the fact that the characteristic func¬ 
tion of a distribution determines the latter uniquely. 

The importance of the operation * in probability theory arises from the fact 
that if X, Y are independent random variables with respective distribution func¬ 
tions F{x), G(z), and if Z = Z + F, then the distribution function H(x) of Z 
satisfies (33), since for any value of a, 

H{a) =P[X+Y < a] = Jf dF(x) dGiy) 

(36) 

= f -If dF{x) \ dGiy) = [ F(a - y) dG{y) = F{a) * Q{d), 

J—00 I J—00 


the evaluation of the double integral by an iterated integral following from 
Fubini’s theorem [S, pp. 76-88] However, (33) may hold without X, F being 
independent, and Theorem 6 shows that (34) will then hold also, and con¬ 
versely 

An example where E{x) = F{x) * G{x) without X, Y being independent 
has been given by Cram6r [C, p. 317, exercise 2] We shall give another Let 
points 0, A, ■ • , F m the {x, y)-plane be defined as follows. 

0 = (0, 0), A = (1, 1), B = (1/2, 1), C = (0, 1/2), D = (1, 0), 

E = (1, 1/2), F = (1/2,0). 


Let/(r, y) have the value 2 inside the quadrilateral OABC and the triangle DEF, 
and 0 elsewhere. Then if f{x, y) is the joint frequency function of X, Y it is 
easily seen that X and F have uniform distributions on the intervals 0 < .t < 1, 
0 < y < 1 respectively and that Z = X Y has the triangular distribution 
given by (33), although Z and F are not independent. 

It would be interesting to know what distribution functions F{x) are such that 
if Z, F, F = Z + F are random variables with the distribution functions F{x), 
F{x), F{x) * F{x) respectively, then Z and F are necessarily independent. A 
rather trivial example of such a distribution function is the step function F(x) 
with jumps of i at the points a; = 0 and a; = 1. It can be shown (oral commu¬ 
nication by W. Hoeffding), in generalization of Cramer’s example, that no abso- 



MIXTUHE OF DISTEIBUTIONS 


367 


lutely continuous distribution function (e.g. the normal distribution function) 
has this property. 


6. The problem of random sampling from a mixed population. Let G(v) be 
a distribution function in the real variable v, and let F(u, v) be for a.e. (relative 
to the measure corresponding to (?) « a distribution function in the real variable 
u, and for every u a Borel measurable function of v. Let 

(37) H(u) = f F(u, v) dG(v); 

J—OQ 

then by Theorem 4 H{u) is a distribution function in Ri. Now define for 

X — (X\ j ' ‘ j ^n)j y (yi j ' * * ) l/n) 

H{x) = H{xt) •. • 

(38) 

G{y) = G{3i^)---G{yn). 

Both B.{x) and G{y) are then distribution functions in . In particular, Ji{x) 
is the distribution function of a random sample of n independent variates each 
with the distribution function (37) Set 

(39) F(x> y) = F(xi ,yi) ■■■ F(x„ , ?/„); 

then for a. e, (relative to the measure corresponding to G) y, Fix, y) is a distribu¬ 
tion function in x, and for every x, F(x, y) is a Borel measurable function of y. 
By Fubini's theorem we have 

Six) = [ Fixi,yi) dGiyi) . [ Fix„, y„) cIGiy^) 

J — 00 V—eo 

(40) = f • • • f Fixi , 2 /i) • • • F(.r„ , ?/„) dGiyi) ■ • • dGiy„) 

J —00 


== f Fix, y) clGiy). 

J—00 


Thus H ('c) is itself a mixture in the sense of Theorem 4. It follows from Theorem 
5 that for any Borel measurable function fix), 


(41) 


f fix) dHix) = f < f fix) dxFix, y) > dGiy), 

v-^ea 1 v—oo ) 


if and only if the left side of (41) exists. When written out in full (41) becomes 

f • • • f fixi, ■■■ ,xj d,,^-! [ Fixi, yi) dGiyi) 

J—00 J—00 Iv—oo 


(42) [I „ L'" L {/.« 

/(^i J • * • J ^n) dxi Fixi , j/i) ' • • di, Fixn , Vn)'^dGiyi) • ■ • dGiyn)‘ 




368 


HERBERT BOBBINS 


Equation (41) is of particular interest in connection with the distribution 
of a statistic t = i(xi, ■ ■ ■ ,Xn) = t(x). For any distribution function J(x) let 
K(t 1 J) denote the distribution function of i when x has the distribution function 
J(x). If we set 


(43) 

/(®) = 1 

1 if t(x) < t, 

0 otherwise, 

then 

(44) 

K(t 1 J) 

= [ f(x) dJ(x). 
J—eo 

Hence from (41), 

■■ 

• m^n)) = . 

m 1 .ff) = £ 1 

(«) 

-00 


-[ 

... f K(i 

1 F(xi , i/O • • • F{x„, 

U—oo 



As an example, let i(x') be Student’s ratio 

(46) 


t s= n^-x/s, 

let 

(47) 

F(u, v) = 


and let 

(48) 

G{v) = < 

0 for V < — a, 

§ for —0 < y < a, 



1 for a < V. 


Then H(u) will be the distribution function of a mixture in equal proportions of 
two normal populations with unit variances and with means —a, a respectively, 
and K{i \ H(xi) ■ • • H{xn)) will be the distribution function of t in random 
samples of n from this non-normal population. On the other hand, K{t [ F{xi , 
yi) ■ ■ ■ F (a;„ , i/„)) will be the distribution function of i in sampling from successive 
normal populations with unit variances and means yi, • • , j/n respectively. 
Relation (45) now becomes 

(49) m I Hix,) ■ ■ ■ if(xj) = S Kit\ F{x, ,y,) ■■■ F(^„ , 3/J)/2" . 

Hl,‘ -iHn 


where the summation is over all 2” sets (j/i, • • • , J/n), each y, being either —a 
or 0 Due to the complexity of K(t\ F(xi, yt) ■ ■ ■ F(xn, Vn)) (the frequency 
function of which is discussed in a forthcommg paper by the author), relation 



MIXTURE OE DISTRIBUTIONS 


369 


(49) is not very useful. In other cases (45) may afford a considerable simplifica¬ 
tion in the evaluation of the distribution function of a statistic obtained in 
random sampling from a mixed population, 

REFERENCES 

[1] W Feiler, “On the integro-differential equations of purely discontinuous Markoff 

processes,” Am Math. Soc. Trans., Vol 48 (1940), p 488. 

[2] a. H. Camebon and W. T. Mabtin, “An unsymmetnc Fubini theoiem,” Am Math. 

Soc. Bull., Vol 47 (1941), p 121. 

[3] P R. Halmos, “The decomposition of measures," Duke Math. Jour., Vol. 8 (1941), 

p. 380 

[4] W. Feller, “On a general class of ‘contagious’ distributions,” Annals of Math. Stat., 

Vol, 14 (1943), p 389. 



SOME APPLICATIONS OF THE MELLIN TRANSFORM IN STATISTICS 


By Benjamin Epstein 


Coal Research Laboratory, Carnegie Institute of Technology 


1 . Summary. It is well Imown that the Fourier transform is a powerful ana¬ 
lytical tool in studying the distribution of sums of independent random variables. 
In this paper it is pointed out that the Mellin transform is a natural analytical 
tool to use in studying the distribution of products and quotients of independent 
random variables. Formulae are given for determining the probability density 


t 

functions of the product and the quotient where f and rj are independent posi- 

V 

tive random variables with p.d.f.’s f(x) and g(y), in terms of the Mellin trans¬ 
forms F(s) = I f(x} dx and G(s) = / dy. An extension of the 

Jo Jo 

transform technique to random variables which are not everywhere positive is 
given A number of examples including Student’s i-distribution and Snedecor’s 
F-distribution are worked out by the technique of this paper. 


2. Introduction. It is well loiown [2], [3] that the Foui’ier transform is a 
useful analytical tool for studying the distribution of the sums of independent 
random variables. It is our purpose in this paper to study another transform 
which is useful in studymg the distribution of the product of independent random 
variables. While it is perfectly true that one can reduce the study of the distribu¬ 
tion of the random variable S = |i la • • • the product of n independent 
random variables |i, la, • • • , In, to the study of the distribution of the random 
variable i; = log | = log |i + log |a •+•••• -f log |n , the sum of n independent 
random variables, it seems worth while to study the distribution problem directly. 
There are advantages inherent in the direct attack on the distribution problem 
which are lost to a considerable degree, if the problem is so transformed that the 
Fourier transform becomes applicable. In this paper we shall show that the 
direct application of the Mellin transform to the study of the distribution of 
products of mdependent random variables yields results of interest. 


3. Connection between Mellin transforms and products of independent 
random variables. The key reason for the importance of Fourier transforms in 
studying the distribution of sums of independent random variables depends on the 
following result: if |i and |2 are independent random variables with continuous^ 
probability density functions, (henceforth abbreviated as pd.f ),/i(a;) and/sCa;), 
respectively, then the p d.f. f{x) of the random variable | = |i + la is expressible^ 
as 

(1) fix) = f fiix - y)f 2 {y) dy = [ f^ix - y)fiiy) dy. 

i/—QO w—OO 

^ In this paper we shall assume throughout that we are dealing with random variables 
with continuous p d.f.’s The argument can be extended with some changes to distribu¬ 
tion functions which are perfectly general, but for simplicity this will not be done here. 

370 



APPLICATION OP MELLIN TRANSPOBM 


371 


But since these expressions are just the Fourier convolutions ofand 
it IS small wonder that the Fourier transform plays such a basic role in studying 
the distribution properties of sums of independent random variables 
Consider now the following result for products of independent random variables 
(4), (5): if ^1 is a random variable with contmuous p d f fi(x) and ^ , independent 
of fi, is a positive random variable with continuous p.d f fi{x), then the p d.f 
f{x) of the random variable ? = is expressible^ as 

(2) 

But equation (2) is precisely in the form of a Mellm convolution of/i(x) and/aCx) 
and therefore it may be expected that the Mellin transform should be useful in 
studying the distribution of products of independent random variables 
It is useful to indicate briefly the properties of the Mellm transform A de¬ 
tailed treatment of this transform will be found in [6] and we shall, therefore, 
stress only those portions of the theory of Mellin transforms which are of im¬ 
portance in the field of statistics. By definition, the Mellin transform F(s), 
corresponding to a fuiiction f{x) defined only® for x > 0, is 

(3) F(s) ^ r f(x)x‘-Ux 

Jo 

Under certain restrictions on f{x) [6, p. 47], F(s) considered as a function of the 
complex variable s is a function of exponential type, analytic m a strip parallel 
to the imaginary axis. The width of the strip is governed by the order of 
magnitude of f{x) in the neighborhood of the origin and for large values of x and, 
in particular, the strip of analyticity becomes a half-plane if f{x) decays expo¬ 
nentially as a; —> 00 There is a reciprocal formula enabling one to go from the 
transform F{s) to the function f{x). This transformation is: 

(4) fix) = [ x~'Fis) ds 


for all X where fix) is continuous and where the path of integration is any line 
parallel to the imaginary axis and lying within the strip of analyticity of F(s). 


2 More generally [4, p 411], if and ^2 are independent random variables with continuous 
p d f .’s /i(x) and fi(x), then the p.d f. of the random variable £ = £i £2 is expressible as 


( 20 . 


/(■r) 




hiy) dy. 


In [4] analogous results are given for random variables with perfectly general distribution 
functions. 

® The reason for this reatuotion is that there are technical difEculties m defining a Mellin 
tiansform directly for a function defined over (—“, “>) In [0], for instance, the Mellin 
transform theory is given for functions defined only for positive values of the argument 
In statistical terminology this means that we are restricting ourselves for the moment to 
positive random variables This is, of course, an unnatural restriction and we shall indi¬ 
cate later in the paper a simple device for treating such questions 



372 


BENJAMIN EPSTEIN 


If, in particular, we are interested in applying Mellin transforms to p.df’s 
of positive^ random variables, the analysis can be carried out rigorously. Also, 
as in the case of the Fourier transform, one has the desirable property that there 
IS a one-one correspondence between p.d.f.’s and their transforms. 

A number of common p.d.f.’s of positive random variables have simple Mellin 
transforms. For example see Table 1. 

In terms familiar to the mathematical statistician, the Mellin transform of a 
positive random variable ? with continuous p.d.f. f(x) is where 

(5) F(s) = Eir") = f dx 

Jo 

The following three basic properties hold; (i) The positive random variable 
ij = a a > 0 has the Mellin transform G(s) = a‘~^ E(s). This is immediate 
since 

(6) 0(s) = E(v’~') = E(a‘-^ r") = a"-' F(s). 

(ii) The positive random vaiiable has the Mellin transform G{s) = 

F(as — a -f 1). To prove this we note that 

(7) Gis) = E(v-') = E{t‘~l = F(as - a + 1). 


In particular if a 


- 1 , i.e., n 


1 

I 


then 


G(,s) = Fi-s + 2 ). 


This is a result which we shall have occasion to use later in the paper. 

(ill) If ?i and are independent positive random variables with Mellin transforms 
Fi{s) and Fi{s), respectively, then the Mellin transform of the product rj = 
is G{s) = Fi{s) ^ 2 ( 3 ). This is immediate since 

(8) Gis) = E(r^) = = E{^r) F(jr') 


= Fi(s)F2(s). 


More generally if , ^ 2 , , In are independent positive random variables with 

Mellin transforms Fi(s), ^ 2 ( 3 ), ■ • • , F„(s), then the Mellin transform of the 
random variable ri = I 1 I 2 • In is G{s) = Fi{s) F^is) ■ ■ • F„( 3 ). This relation¬ 
ship is fundamental and justifies the introduction of Mellin transforms in 
studying products of independent random variables. 

From ( 8 ) it is clear that we can find the p.d.f. g(y) of the random variable 
n which is the product of two positive independent random variables |i and I 2 
with continuous p.d.f.’s f\{x) and fi{x). In fact, by the Mellin inversion formula 


(9) 


1 ^O+T.OO -i pO+i.'X 

g{y) = 5 — / y~‘G{s) ds = — y~’ Fi(,s)Fi{s) ds, 

ZttZ- Jo—1.60 


c+i,so 


* See footnote 3. 



ifelliD Transform | R«eioii of Analyticity of Transform 


APPLICATION OF MELLIN THANBPOKM 


373 




374 


BENJAMIN EPSTEIN 


where the path of integratioB is any line parallel to the imaginary axis and lying 
within the strip of analyticity of G(s), As in the case of characteristic functions, 
it can be shown that there is a one-one correspondence between p.di.’s and their 
Mellin transforms Therefore, it follows that the p d.f. g(y) computed in this 
way must be precisely equal to 

(10) ^ i (s) *(.)*=! i /. (l) /.(X) dx. 

It is easy to verify this directly by showmg that the Mellm transform of the 
right-hand side of (10) is -Pi(s) Fiis) [6, p. 52], but this will not be done here. 
The essential point is that Equation (9), (which is sometimes easier to evaluate 
than Equation (10)), is a consequence of an algebraic formalism which is 
capable of revealing relationships which would otherwise remain hidden. 

The p.d.f. h(y) of , the ratio of two positive random variables with 

continuous p.d.f’s, can be reduced to finding the p.d.f. of the product of inde¬ 
pendent random variables and — . If Fi(s) and Fifs) are the Mellin transform 
corresponding to and , respectively, then by (ii) F%{~s + 2) is the Mellin 
transform of rand, therefore, the Mellm transform His) of ^ is Ei(s) Fi 

52 52 

(—s -h 2). Therefore, the p.d.f. hiv) of rj is 

1 H *c+t,eo 

(11) Hy) = y"'His) ds = ^ y-’Fiis)F,i-s -f 2) ds. 

This formula is useful in finding distributions such as Student’s t and Fisher’s'z. 

4. A modified Mellin transform procedure for finding the distribution of the 
product of independent random variables which are not everywhere positive. 
Up to this point we have limited ourselves to the application of the Mellin 
transform to finding the distribution of the product or ratio of two positive 
independent random variables. While it is true that a number of interesting 
probability density functions are defined only for positive* values of the argument, 
it is certainly desirable that we be able to treat situations involving random vari¬ 
ables capable of taking on both positive and negative values. A simple device 
for extending the Mellin transform treatment to the more general problem is to 
decompose the p.d f .’s /i(.'c) and f^ix) of the independent random variables 
and fs into 

fiix) = Mx) H- M^), 

Mco) = Uix) -I- fiiix), 

® For example, diatributions of type 3, thex* distribution, the distribution of the sample 
standard deviation and sample variance, the distribution of an even power of a random vari¬ 
able, etc are all defined only for positive values of the argument 



APPLICATION OP MELLIN TRANSPORM 


375 


where® 

/ii(x) = 0, a: < 0, fi 2 (x) = 0, a: > 0, 

fiiCx) = 0,x <0, /j 2 (a:) = 0, a: > 0, 

and then to operate on the pairs [/u(a;), fnCx)], [/ii(a:), fn(x)], IMx), fnCx)], and 
[/i 2 (a;), /sjCh)] separately. More specifically, the frequency distribution h(y) 
corresponding to the random variable ij = is made up of the sum of four 
components hi(y), h 2 (y), }h(y), and hi(y). To compute hi{y) one can apply 
the Mellin transform directly to the evaluation of the expression 

hi(y) = I -fn (-) /ai (a:) dx, 

Jo a \a:/ 

since both/ii(a;) and/aiCa:) are zero for negative values of x. The function hi{y) 
IS zero for y < Q. To compute hiy) we first evaluate 

htiy) = f -fn(^M~x)dx. 

wQ CC \jj(/ 

Again/ii(.a:) and/ 2 a(—a:) are zero for negative values of x and, therefore, the con¬ 
ventional Mellin transform can be applied in determinmg hi*(y). It is clear that 
h*{y) = 0 for 1 / < 0 and, therefore, hiv) = h*(—y) = 0 for y > 0. Similarly, 
one can find h(y) and hiy) where hiy) = 0 for y > 0 and hi{y) = Qiovy < 0, 
and it is readily seen thaU 

Ky) = h(y) + hi{y) + hi(y) + hi(y) 

IS the desired p.d.f. of ij = ■ 


6. Examples of use of Mellin transforms in evaluating the product and 
quotient of independent random variables. Example 1: The distribution of 
ij = , where and are independent random variables with p.d f.’s fi(x) 

andfiix), respectively, where 

fiix) = Mx) = - CO < a: < =0. 

In this case 
and 

h(.^) — /2i(») + fii{^) I 


• Of course, fii ,/i 2 ,/ 2 i , and As are generally not p d.f.’s since I fuix) dz, | jii{z) dz, 

Jo v—w 

[ f 2 i(x) dx, f filix) dx are no longer necessarily equal to one. 

Jo J-M 

’’ As in footnote 6, hi , h 2 ,hi, and hi are, in general, not p.d.f.'s. 



376 


BENJAMIN EBSTEIN 


where 

fn{x) = 0,x < Olfnix) = 0 , a; > 0 ; 
f 2 i{x) = 0, a: < 0;fi!i{x) = 0, a: > 0. 

The random variable rj = has a p.di. h(y) = hi(y) + hiy) + h^iy) + hi{y) 
where 

hi{y) IS associated with I/n(a:), / 2 i(a!)], 
hi{y) IS associated with [/u(a:), / 22 (x)], 
hsiy) is associated with [/i 2 (a:), / 2 i('c)], 
and hi{y) is associated with [/i 2 (a:), / 22 (a;)] 

It IS sufficient to evaluate 

hiy) = f -fn (-) M(x) dx. 

Jq j 

-J. 

In this case 

.« r“ 1 , 94(«-3) 

Fn(s) = I a:'-Vu(a:) dx = | a:*"* dx = ^ r(s/2), 

analytic for Re(s) > 0 
and 

o” oi(*~8) 

Fii(s) = I x' ^fti(x) dx = —^ r(s/2). 

‘'0 Vtt 

Therefore, 

ffi(s) = Fn(s)Fn(s) = — r^(s/2) 

TT 

fiiiy) = / y~’Hi{s) ds 

JjTTv VC—liSO 

1 y.c+t,co n«—3 

= ^ / y~’ — r“(s/2) ds, c > 0 

aTTc' VC—ifOO TT 

= ^ ^^^o(2/), 1/ > 0 [6, p. 197] 

where Ko{y) is Bessel’s function of the second kind with a purely imaginary argu¬ 
ment of zero order. Similarly 


1 



APPLICATION or MBLLIN TRANSFORM 


377 


Therefore, h(y) = hi(y) + hi{y) + hi{y) + /^(y) 


= - Uy), 


— 00 < y < 00, 


and this is the desired p.d.f. This result has been found by other methods and 
is given in [l,p. 1]. 

Example 2; The disLribution of y =~ where and I 2 are independent random 
variables with p.d.f .’sfi(x) andfiix), respectively, where 




- =0 <y < 00. 


As in Example 1, one splits the determination of h(y), the p d.f. of y, into four 
parts: hi(y), hi(y), hsiy), hi(y). In the notation of Example 1 it is easy to show 
that Hii{s) the Mellin transform of hi{y) is 

2K«-3) 1 1 

T'n(.e)Fiii—s -h 2) = /— r(s/2) r(—s/2 -(- 1) 

V T V^r 


/»C+t,ao 

1 />'+*'“ 1 ds 


4 Sir 
sin -2 


0 < c < 2, 


2iri 


PCI 

Jc— 1 


e —t,wj 4 . Sir 

sin — 

2 


1 


Similarly 


27r 1 + y^’ 


h{y) = 


2^ 1 + y^’ 


h{y) = ^ ^ 


2ir I + if’ 


hiv) = ~ ^ 


2x1 +f’ 

Therefore, h(y) = hi(y) + hi(y) + h^iy) + hi(y) 

1 1 


xl + f’ 


y > 0. 

y < 0 , 

y < 0 , 

y>o. 

— 00 < y < CO, 


This result has been found by other methods and given in [4, p 411]. 

Example 3: F-Distribution. Let , • • • , , 4i > •'' , i? 7 i be (m + n) independ- 



378 


BENJAMIN EPSTEIN 


ent random variables, each normally distributed with mean zero and standard 
deviation a. Let 


n = 

1=1 J =1 

We want to find the p.di. h{z) of f where f = ^/i). The p.d.f.’s /(a:) and g{y) 
of f and 71, respectively, are: 


fi^) = 


^m/2-I g~i/2,72 

2’"/V”‘r(m/2) ’ 


a: > 0, 




y > 0, 


In this case 


2 ’“V'‘~’'r 


Fis) = 


r(m/2) 


, analytic for Re (s) > 1 — - , 

£i 


Ois) = 


2-v'^-^r s + ^ - 1 


r(n/2) ’ 

The p.di. hiz) has Melhn transform 

H{s) = F(s) Q{-s + 2) 


Th 

, analytic for Re (s) > 1 “ 5 - • 


r s + 


f-OK- 

r(m/2)r(n/2) 


-+i+l 


Therefore, 

■i ^C*H,00 

h{z) = ^ dz, - ^ + 1 < c < J + 1, 

ZTTI VC—r,oo ^ u 

/ m + n \ 

_ \ 2 / 2>0 
r(w/2)r(a/2) (z + l)Mm+r.) ’ 

A convenient way of carrying out the inversion is to use formula (d) in Table 1. 
In a similar way one can find Student’s distribution, i.e., the distribution of 

f = fo/?i, wheie ij = ^llt,l'n , and where £ 0 , fi, ■ ■ ■ , fn are n + 1 independ¬ 
ent random variables each having the distribution; 


-vs 


— 00 < a; < 00 . 



APPLICATION OF MJ5LLIN TRANSFORM 


379 


It should be mentioned in conclusion that the Mellin transform is a natural 
tool to use in situations involving the products and quotients of independent 
uniformly distributed random variables, or in finding products and/or quotients 
and/or Beta-distribution. In such cases formulae (b), (c) and (d) in Table 1 
are useful. 

REFERENCES 

11] C. C. Cbaig, "On the frequency function of xy," Annals of Math. Stat., Vol. 7 (1936), 

pp. 1-16. 

[21 H. CbamiSr, Random Vanables and Probahihly Distributions, Cambridge Tracts in 
Mathematics, No. 36, Cambridge, 1937. 

[3] H Cram:6b, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[4] J H CuBTiss, “On the distribution of the quotient of two chance variables," Annals 

of Math Stat , Vol 12 (1941), pp 409-421 

[6] E. V piuNTiNGTON, "Frequency distribution of product and sum,” Annals of Math 
Stat , Vol 10 (1939), pp. 195-198. 

[6] B C, Titchmahsh, Introduction to the Theory of Fourier Integrals, Clarendon Press, 
O.xford, 1937. 



THE ESTIMATION OF LINEAR TRENDS 


By G. W. Hotjsner and J. F. Brennan 
Cahfornia Insitiuie of Technology 

1. Summary. This paper deals with the problem of bivariate regression 
where both variates are random variables having a finite number of means dis¬ 
tributed along a straight line. A regression statistic is derived which is inde¬ 
pendent of change in scale so that a prior Imowledge of the frequency distribution 
parameters is not required in order to obtain a unique estimate. The statistic 
is shown to be consistent. The efficiency of the estimate is discussed and its 
asymptotic distribution is derived for the case when the random variables 
are normally distributed. A numerical example is presented which compares 
the performance of the statistic of this paper with that of other commonly used 
statistics In the example it is found that the method of estimation proposed 
in this paper is more efficient. 

2. Introduction. A problem that often arises in statistical work is the estima¬ 
tion of linear trends. In the general problem it is known or presumed that a 
linear functional relation exists among a set of variables of the form, 

a + biX + hY + hZ ■■■ =0 
The observed values of the variables are of the form 

Xik ~ Ai "t" €j'l, l/th = Yi ”1“ Thk) CtC. 

That is, the aj,*, are random variables with means A, and fc = 1,2, ■ observed 
values of x are associated with the mean Z,. The ordering of the Xi is according 
to magnitude. Similarly there are the observed values yu , z,k and so forth. 
The €,4 are random variables, with the same distribution for all i, with zero 
means. On the basis of a sample On(xik, y^i, ZiK, ■ ■ ■ ) it is desired to estimate 
the coefficients a, hi, h, hi, • ■ • . One method used to estimate the coefficients 
is that of “weighted regression” which is essentially an application of themethod 
of least squares. The problem has been studied by R. Allen, A. Wald and 
others.^ The chief difficulty has been that the proposed methods of estimation 
require an a priori knowledge of the variances of the random variables. Wald 
has proposed a statistic which avoids this difficulty but which may have a rela¬ 
tively low efficiency in cases often encountered in practice In this paper there 
13 described a bivariate statistic which appears to have comparatively high pre¬ 
cision and which does not require prior Imowledge of the variances of the random 
variables, A numerical example is given at the end of the paper to illustrate the 
comparative performances of different methods of estimation. 

' For a brief lustory of work done on tMs problem see the paper by A Waldin the Annals 
of Math. Stal., Vol, 11 (1940), p 284, 


380 



LINEAR TRENDS 


381 


3. The Regression statistic. In the case of the bivariate problem, consider a 

sample 

^ “ 1 j 2, ■ * j R 

and 

/c = 1, 2, • • , iV,, 

where iV, sample values a:,, i/, aio distributed about mean Z,, F,. Let the 
means be related by Fj = a + hXi and let the random variables be independent 
and have the same frequency distribution with variance v* for all i and the ran¬ 
dom variables y, have independent frequency distributions with variance a] 
the same for all i . An appropriate statistic for estimating b is obtained by noting 
that a pair of sample points (a,*., y,*), (xji, y,i) gives a sample value of the 
change in y corresponding to a change in .t. It may thus be said that a sample 
value of b is 


( 1 ) 


hal.jl 


Vtk y^i 

X%i X^i 


Making use of the fact that 

(2) yu = a + bx,k + - bt^k 

equation (1) may be written 

(a;,k - x,i) t),k „i = (.t:,! - Xji) b + - Vii) ~ Huk - in) 

Summing this equation over all combinations of points there is obtained 

S2S2 ivtk — Vii) SSSZ) iivth — vji) ~ Ksii - i}i) 

/o') I, _ ‘ _ _ ' j k _i ____ 

^ ^ " ZSZi: Cai - X,,) EEEZ (.A. - xn) 

i j k I % j h I 

The summations in the above expiession are to be earned out for 


i = 1 , 2 , ■ ■ ■ , iV,' ; = 1, 2, ■ • ■ , ;i = 1, 2, • • , (f - 1), f = 1, 2, • • ■ , n. 

The first term on the right side of equation (3) is an estimate of h and the second 
term represents the deviation of the estimate from the true value Accordingly, 
we take as an estimate of b the statistic 

SEEE ( 2 /.fc - 2 /ji) 

^ ■ iisi: ' 

\ i h I 

This requires, of course, that the denominator be not equal to zero. Summing 
out the subscripts /c and I reduces (4) to 

Z Z - y,) 

I = _ 

Z Z A". N,(x, - Xj) 

isl 2^1 



382 


Q. W. HOTJSNBR AND J. F. BRENNAN 


whore y, is the mean value of the Va and so forth. Summing out the subscript 
3 gives 


(5) 


b = 


Z iNiy^ Z N, -n.Zn, 


Vi 


Z (Nil, Z N, - NiZ Njx, 

I \ 1 I / 

This expression may be put in a more convenient form by using the identity 

i (n. t ft.t) ~ I (jf.s. s at,) - I (w.ff, (e AT, - t if,)). 

With this substitution equation (5) becomes 


( 6 ) 


•M 

z 

t^l 


fZA, - 2 Za, + Wr) 
W 1 /J 

n 

Z 

tssl 

NiXi 

(z A, - 2 z W, + W.) 


This is the statistic for estimating the linear trend of bivariate data. It may be 
noted that its derivation is not based on the notion of fitting a line to the sample 
points. A hue y = 6.+hx may be fitted to the sample points by making it pass 
through the mean of the sample points, that is, by using the following estimate: 

d ~ y — t)x 

where y and x are the means of all the Vtk and xa respectively. 


4. Consistency of the estimate. Having established the statistics h and d it 
IS desirable to examine the consistency and efficiency of the estimates, particu¬ 
larly for h. To determine that & is a consistent estimate Ave investigate the 
behavior of (6) as the number of sample points increases, that is, as the iV, —» «. 
We wish first to establish the following identity. Consider the sum of the 
following array of terms: 

AiCWi + i\r2 + • • • + A„) 

NiiNi + N 2 + " ■' + N'n) 


NniNi + N 2 + ■■■ + NJ 

n n 

The sum may be written Z NiZ < Since the array is skew symmetrical the 

1 1 

■» l 

expression 2 Z, N,. Z. Ni also gives the sum of the array except for the fact that 
1 \ 

the terms along the principal diagonal are counted twice We have, therefore 

Zn.Znj = 2Zn.Zn, - ZnI. 

11 11 1 



LINEAR TRENDS 


383 


Eearranging terms we obtam the identity 


(7) 






= 0 


Now substituting (2) into (G) and making use of (7) there is obtained, 


( 8 ) 


t = h + 


n 

z 

1 

(Z ATi - 2 Z A7, -b AT.] 

1-1 

rO 

1 

n 

1 

Z. (Z AI, - 2 Z iV, -b 

1^.1 

J 



The and are random variables with zero means so that as —> oo the sample 
means and converge in probability to zero. As iVi —> converges in 

probability to its mean Z,. In view of (7) and that the denominator in ( 8 ) 
is not equal to zero the last term in (8) converges in probability to zero and S —^ 6. 
The estimate is therefore consistant. A similar argument also shows the estimate 
d to be consistent 


6 , Efficiency of the estimate. A general investigation of the efficiency of the 
estimate h is beyond the scope of this paper. We may note, however, that the 
efficiency of the estimate can be made to depend upon the gi’ouping of the data, 
that is, the optimum efficiency of the estimate may depend upon the omission 
of some of the pairs (yn — yn) from the estimate. The maximum efficiency is 
obtained for h when the second term in (3) is minimized, This requires prior 
knowledge of the frequency distribution of the random variables x and y, how¬ 
ever, in applications a recognition of (3) may often indicate a practical method 
of increasing the efficiency. 

In what follows we make an investigation of the precision of the estimate b 
for a special case which is of some practical interest. Let x and y be random 
variables as defined in the first part of the paper and consider the new variables 

A 1) 

defined by 6 = - that is, 
u 

W = i: [z. AT, - 2 z AT, + X, j 

= Z [fV. (^Z AT, - 2 i: A 7 , -h 

The random variables u and v are then independently distributed with joint 
probability element /(it) /(ji) du dv. Making the change of variable u = r cos d, 
V = r^n 6 the probability element becomes/Cr, 6)drdd where tan d = u. Integrat¬ 
ing out the variable r gives the probability element for 6. In what follows we 
investigate the distribution of 0 for the case where x and y are normally dis¬ 
tributed with the same variance. Since u and v are linear functions of x and y 
respectively they are also normally distributed with the same standard deviation. 



384 


G. TV. HOUSNER AND J P. BRENNAN 


We designate the means of u and v by mi and m 2 respectively and the standard 
deviation by v The probability element in u and v is then 


(9) 


— exp j- [(?t - miY + (v -vhf 


270^2 




du di! 


Changing variables to r, B and setting mi = f cos B, m 2 = f sin 6 we obtain the 
following probability element: 

^ exp < — [(r cos 6 — f cos of + (r sin 0 — sin 0)] > dr dB 

2 ( t ^ I 


2Trff^ 


1 


Completing the square in r and substituting 4> = B — B there is obtained 

j 

To integrate out r make further change of variable 

, r f ^ 
t --cos (/) 

<7 a 


T « » • • t 

Setting - cos 41 = w for convenience in notation there is obtained 

ff 



The variable t is to be integrated out of this expression. The corresponding 
limits of integration are exhibited by 



Now as the number of points in the original sample increases the value of f 

O’ TT 

also increases and as “ —> 0, with ] <;!>) < "> the value of iw —> «. In this case 
then (12) approaches asymptotically to 

As tr/f 0 this distribution shows that <l> converges in probability to zero and 
that the distribution approaches asymptotically to the normal form 



It is required then to examine the conditions under which v/f assumes small 
values. If the variance of the original variables .r, and y, is designated by ffi 



LINEAR TRENDS 


385 


then since ii and v are linear functions of xi and j/, respectively the variance of u 
and of ^ IS 


(14) 


2 2 'Vp 

tr = cri 2^ 


-2Zn, + n, 


Now f is the sum of the squares of the means of u and v so that 


(15) 


= d + b’*) £ 


nJT,n,~2'Zn, + nAx 


Dividing (14) by (15) we obtain 
(16) 


\fj 1 + h- [at r. 


N,(ZN,-2'tNj + IV,) X. 


Inspection of (16) indicates that as the number of sample points X; increases the 


value of j decreases rapidly. To illustrate this we examine some particular 

cases Consider first the case of four equally spaced means Xi = SiVi, 
(i = 1, 2, 3, 4) and let there be one sample point for each mean (X, = 1). 
With these values there is obtained, 

0.022 
" r+T^’ 


For 5 = 1 the range —9° < <^ < +9° includes 95% of the population defined by 
(13). As the number of points X, is increased or as the number of means X, 


is increased the value of decreases rapidly. Consider now eight equally 

spaced means X, = 3,vi, (f = 1,2, • * • , 8) with again one sample point for each 
mean (X, = 1). With these values there is obtained 


/trV _ 0.00045 
\f / 1 + b^ ■ 


For b = Ithe range -1°<<^) < +1° includes 95% of the population defined 
by (13). _ ^ 

It is clear that a very high degree of precision is obtamed with the estimate 6 
when there is a considerable number of sample points. However, this will also 
be true in general of other statistics and it is really of interest to compare pre¬ 
cisions in those cases where the statistics have a relatively low precision A 
detailed comparison is beyond the scope of this paper. However, a direct com¬ 
parison can be made very easily in the particular case when x, is a fixed variate 



386 


G. TV. HOUSNER AND J. F. BRENNAN 


and only y, is a random variable. For the sake of simplicity, let each A^, = l 
then the statistic for estimating h is 

n 

iiy^ ~v) Z) 2/.(f - ^ 

_ _ _1 _ 

_ n 

X) — ^) 2 3;i(i — i) 

1 1 

Since 6 is a linear function of the y, by a well knoivn theorem its variance is 



The customary least squares regression line of y on a: gives for the estimate of h 
and its variance 


n 



In the particular case when the Xi are equally spaced, = ci + d, the estimates 
h and hn are identical: 

^ ? ='■<’ - 

6 . Numerical example. From a practical point of view the case where x and 
y are random variables is of greater interest than where a: is a fixed variate. We 
give a numerical example of this case comparing the statistic h with several other 
statistics. Consider the case where there is one sample point for each mean X,. 
We shall evaluate the following: 

1) The statistic of this paper which for this case is 





n 


Z) y,{i - ’0 



Z xS - i) 
1 


2 ). The statistic obtained by minimizing the sum of the squares of the y 
deviations only 




LINEAR TRENDS 


387 


3). The statistic obtained by minimizing the sum of the squares of the orthog¬ 
onal deviations 


£ (?/. - yf ix. - x) 
1 

+ 


h = 


iy^ — 0 - {x^ - -f 4(£ (y, - y){x, ~ x) 


n 


£ (?/ - y)(x - x) 


TABLE I 


Set 

1 Zi 

2/1 

, 

yi 

1 Xs 

2 /s 

2:4 

Vi 

1 


HI 


2 0 

3 0 

2.7 

3.6 

4.3 

2 

■9 

mSM 


i 2,0 

3 4 

3.1 

3 8 

4.2 

S 

1.0 

1.4 

1.6 

2.1 

2.8 

3 2 

4.4 

4,3 

4 

0.6 

0 7 

1.8 

2.0 

3 3 

2 6 

3.8 

4 0 

6 

0.7 

1 4 

1.7 

■9 

2 7 

3.4 

4 1 

4.1 

6 

1.0 

1.2 

1 6 


2.0 

2.6 

3 6 

4 0 

7 

1 3 

0 7 

1 7 

1 

2 1 j 

2.7 

2.9 

4.0 

3.6 


TABLE II 


Set 

bi 

h 

h 

64 

1 

1.160 

1.068 

'^1 

1 162 

2 

1 056 

1.009 


1.027 

3 

0 860 

0.843 


0.870 

4 

0 946 

0,896 

0 924 

0.830 

5 

0.875 

0 867 

■19 

1.000 

6 

0 978 

0.939 

mSEM 

0.846 

7 

1 044 

0,959 

■■ 

1,000 

Mean . 

0 990 

0.940 

0 996 

0 962 

7 X Sample Var 

0 0686 

0 0373 

0 1068 

0.0834 


4). The statistic proposed by Wald'’ 

Tt/2 n 

£ ?/i ~ £ Vi 

t 1 n/2 

04 = 572-5— ■ 

^ j X% — ^ 'j Xi 
1 n/2 

We apply these statistics to sample data having four means Xt — i and Ti = 
f, (i = 1,2,3,4). By means of a table of random numbers seven sets of data were 


’ Loc. eit. 



















388 


G. -VV. HOUSNEH AND J. F. BRENNAN 


obtained, each set having one sample point corresponding to each mean. These 
sample points are described by Tabic I where it will be noted that the sample 
points were drawn from a discrete distribution. The estimates obtained from 
the four statistics are exhibited in Table II. 

If the 28 sample points are treated as a single set of data and the four statistics 
in their appropriate forms are applied, there is obtained the following set of esti¬ 
mates : 

bi hi bi t>i 
aW^8 Ol83 0.9786 0.9496 ‘ 

The preceding computations show that the estimate U is inferior to the other 
estimates, as would be expected. The estimate hz is most accurate when the 28 
sample points are treated as a single set of data with the estimate Si being only 
very slightly less accurate, hi = 0.9768 as compared to hz = 0.9786. When the 
individual sets of sample points 1 to 7 are considered it is seen that the estimate 
bi IS most accurate with the estimate bz rather less accurate, the estimate hi is 
more precise than hz , the sample variances being in the ratio 0.0686 0.1058 = 

0.65. Prom a practical viewpoint we may also point out that the computation 
of hi requires very much less labor than the computation of hz. 



ON THE EFFECT OF DECIMAL CORRECTIONS ON ERRORS OF 

OBSERVATION 

By Philip Hahtman and Aurbl Wintnbr 
The Johns Hopkins University 

1 , Summary. Let t be the true value of what is being measured and suppose 
that the error of observation is a symmetric normal distribution of standard 
deviation cr The “roundmg-off” error due to the reading of measuiements to 
the nearest unit has a distribution and an expected value depending on t and a- 
It IS shown that, for a fixed a > 0 , the expected value of the decimal correction, 
r(f; cr), IS an analytic function of i which is odd, of period 1, positive for 0 < f < i, 
and has a convex arch as its graph on 0 g f g ^ Furthermore, if 0 < f < i, 
both r(i; v) and its maximum value, Max r{l, J), arc decreasing functions of cr, 

t 

2 . Introduction. Let X be an error of observation and let <h{x) denote the 
density of probability of the distribution of X. In particular, 

A 4-00 

(1) / 4 >iz] dx = 1, where 4 >ix) ^ 0 

00 

If i IS any fixed number, the density of probability of the distribution of 
Z + ii is 

Besides the “instrumental error of observation”, Z, there is another error, that 
of the “rounding-off”, which is carried along in the registration of the measure¬ 
ments. It is introduced by the circumstance that, if • , h, a are digits, and if 

h denotes the last digit considered, then decimal fi actions such as • ■ha and 
■ • ha • ■ ■ are registered as • b if o < 5 and as • (b + 1 ) if o > 5 . Let 
the unit, in which the measurements are expressed, be so chosen that the first 
digit neglected becomes the first digit following the decimal point, i.e , that the 
error of the “rounding-off” is between ±5 Then, if t denotes the true value of 
what is being measured, the remark made after (1) shows that the probabiliLy that 
the error of the decimal corrections be less than x is given by 

to i+T 

X) / <l){u ~ t) dn, 

nt^-QO « n—i 

if 1 x 1 g 5, whereas this probability is 0 or 1 according as x < — 5 or a > 
Since the last series can be written in the form 

(2) X / cf>(u + n - t) du = X <p(u + n - t) du, k 0), 

7j—ee I J—^ nes—eo 

it follows that the density of probability of the error due to the decimal correc¬ 
tions is 

( 3 ) X + w - f) if 1 ^ 1 < ii and 0 if I X 1 > i 


389 



390 


PHILIP HARTMAN AND AUREL WINTNER 


Consequently, if r = r(t) denotes the expected value of the decimal error induced 
on the "true” value, t, of the observations, then 

( 4 ) r(t) = / X Jf, (hix + n ~ t) dx. 

n—00 

Formula ( 4 ) is known It is usually based on its intuitive interpretation which 
results if, on the one hand, ( 4 ) is written m the form 

(5) r{t) = f s(x)(/>(x — t) dx, 

iL-eO 

where 

(6) s{x) = a) if —^ < 33 < ^ and s(a:) = s(33 + 1), — » < a: < w, 

and, on the other hand, the periodic function (6) is thought of as representing the 
uniform distribution of the error of “roundmg-ofl'’ over the arithmetical continuum 
over a period, 

(w = 0, ±1, 

on the ai-axis. Needless to say, the specification of s{x) at the points x-n-\-\, 
which are disregarded in the definition (6), is immaterial, since s{x) occurs in 

(5) only as an mtegrable weight-factor, isolated values of which do not influence 
the integral. 

It follows at once from ( 1 ), ( 5 ) and the continuity (almost everywhere) of 

(6) , that r(f) is continuous. 

3 . Fourier analysis of r{i). Since the Fourier expansion of the periodic func¬ 
tion (6) is 

PC 

(7) 5(3;) = 2j (-1)”^"* sin 2 nmx = s(a; ± 1 ) = • ■ • , (| a: | < 5), 

it follows from ( 5 ) that^ 

(8) r{t) = X (— f <i>{x) sin 27 rn(a; -f- i) dx. 

n=l 00 

Hence, if the sine in (8) is expressed in terms of 2 ima; and ^mt, 

oo 

( 9 ) irrii) = — cos 2 irnt -f b„ sin 2 rnt), 

n=l 

ip. Zernike, “Waliraoliemlichkeitarechnung und mathematiaohe Statiatik, “Handhuoh 
der Physik, Vol 3 (1928), pp 475-476. 

2 In view of (1), the term-by-term integration leading from (6) to (8) la justified by the 
fact that the partial sums of the series (7) are uniformly bounded Correapondmgly, the 
above deduction of (9) and (10) from (4) is equivalent to an application of Poisson’s summa¬ 
tion formula. In this regard, cf, A Wmtnei, “The sum formulae of Euler-Maolaunn and 
the inversions of Fourier and Mobius,” Am. Jour of Math , Vol. 69 (1947), pp. 685-708, 
the end of §1 (p 687) and its application on p. 697, 



EFFECT OF DECIMAL CORHECTIONS 


391 


where 

00 

(10) hn + m„ = / (j>{x) exp (2Tnnx) dx, {n = 1,2, ■). 

Let it be assumed that positive and negative errors of observation, when of the 
same magnitude, are equally probable, le., that <j){x) = <ii{-x) Then (10) 
shows that a„ becomes 0. Hence, (9) reduces to 

00 

(11) r{t) = — X) (~l)"(c„/n) sin 27nii, 
where 

(12) ' c„ = 7r”‘ / ^(x) cos 2Trnx dx = 2 t~^ i . 

J—ec i/o 

Clearly, r{t) is an odd function whenever the density 4>[x) is even. 

4. The normal case. Supiiose that <l>{x) is the density of a symmetric normal 
(Gaussian) distribution. Then, if o- is the positive constant representing the 
standard deviation of the errors of observation, 

(13) <i>{x) — (2Tro-*)~* expf—-^ai^/cr^) (0 < <7 < co). 

It is clear from (5) and (6) that 

(14) r(f} -A s{t) if o- —> 0 in (13). 

Actually, all that (14) says is a triviality, according to which the total error 
becomes the decimal error when the measurements become infinitely sharp. 
In this limiting case, that is, if r(i) = s(0, it is seen from (6) that the giaph of the 
periodic function r = r(t) is piecewise linear, and therefore discontinuous 
If o- = 0 IS replaced by 0 < o- < oo, the jumps of r(t) at i | disappear 
(cf the end of §3) and, as will be proved below, 

(I) r(t) is an analytic function which is odd, of'period 1, and positive for 0 < i < ^ 
{hence negative for — ^ < i < 0), and 

(II) the graph of r = r{t) over the fundamental interval 0 ^ t ^ ^ is a convex 
arch, no matter what the value of o in (13) may he. 

Since r now depends both on the “true” value, t, of the observations and the 
‘ ‘precision’ ’, tr, of the measurements, let r be denoted by r (t, cr). It will be shown 
that 

(i) Max r{t\ a), where the Max refers to t while a is fixed, is a decreasing function 
of O', where tr varies on the half-line 0 < v < <», and that, on the same half-line, 
^ (ii) r{t] cr) IS a decreasing function of a at every fixed t contained in the funda¬ 
mental region 0 < f < |, 

All of this seems to be clear for physical reasons. Actually, it is easy to give 
examples of distribution laws, distinct from (13) for which the above assertions 
become false. 



392 


PHILIP ILVRTMAN AND AXIHEL WINTNER 


6 . The 7 ? 3 -function. As is well-known, 

f Gxp ( —cos uxdx = (2710-^)^ exp (, — ^a^u^) 

J—00 

Hence, the value of the integral ( 12 ) is 5 "’, if q is an abbreviation for 

(15) g = cxp(—2TrV) 

Consequently, if r{t, q) is defined, in terms of the above ?'(i; < 7 ), by placing 

(16) r(t, q) = r(t, <r) in virtue of (15), 
then (11) shows that* 

(17) rit, q) = —7r~‘ {—IT rf’' q’''‘ sm 2Tr7it 

n«l 

It will be noted that the range, 0 < tr < »^ of the standard deviation is mapped 
by (15) on the range 

(18) 0 < 2 < 1 , 

and that v decreases or increases according as q increases or decreases, 

Let partial differentiations with respect to t and q be denoted by primes and 
subscripts, respectively. 

(19) /' = dj/dt, J, = df/dq. 

Thus, from (17), 

(20) r'it, q) = -2 E (-l)"?"' cos 27rnt 

nsil 

and, as easily verified from (17), 

(21) uit, q) = {- 4^qT\"{i, q). 

Let 6{t, q) be defined by 

(22) e{t, 2 ) = 1 -h 2 E 2 "' cos nt 

n«-l 

(so that 6{t, 2 ) IS, in the main, the elliptic theta-function usually denoted by 
7 ^ 3 ). It is knoivn that 

(23) d{i, 2 ) > 0 
and that* 

(24) 6'{t, q) < 0 if 0 < f < TT (hence, 9'{t, 2 ) > 0 if — tt < f < 0). 

The above assertions will be deduced from these facts. 

’ Cf. P Zernike, loo. oit 

1 For a simple pioof, cf. A, Wintner, “On the shape of the angular case of Cauchy’s dis¬ 
tribution,” Annals of Math. Stat., Vol. 18 (1948), pp. 589-593, §6 



EFFECT OF DECIMAL CORRECTIONS 


393 


6. Proof of (I)-(II) and (0“(ii). First, it is seen from (17) and (22) that 

(25) r'{t, g) = 1 - e(2irf - tt, g). 

Hence, 

(26) r"{i, g) = - 2 ,r 0 '( 27 rf - ir, g). 

If (20) IS compared with (24), it is seen that 

(27) g) < 0 if 0 < i < I (hence, r"{t, g) > 0 if - | < f < 0). 
Consequently, (I) and (II) follow, since, in view of (17), 

(28) r(±i g) = 0 = r(0, g), 

Next, (21) and (27) imply that 

(29) r«(f, g) > 0 for 0 < f < 

Hence, (n) follows from the fact that g is a decreasing function of c. 

As to (i), let i = i{q) denote that (unique) f-value on 0 < f < | at which 
r{t, g) assumes its maximum value, say r\ so that 

(30) r’’ = r(f(g), g), (0 < i(g) < |). 

Clearly, t - iiq) is the only i-value on 0 < i < ^or which 

(31) r'{t, q) = 0 

Since r'({, g) possesses continuous partial deiivatives with respect to t and g, 
and stnce (27) implies that its partial derivative with respect to /, namely, r"{t, g), 
does not vanish at f = 1(g), it follows that the solution t = 1(g) of the equation 

(31) possesses a continuous derivative Hence, the function (30) possesses a 
continuous derivative with respect to g, namely, 

(32) ^ = r'iiiq), g) ~ + rMi)> ?) 

But since t = i(g) is a solution of (31), the identity (32) can be reduced to 

I* = r,m, !), (0 < <(5) < !)■ 

Consequently, (i) follows from (29), since g is a decreasing function of c. 



WEIGHING DESIGNS AND BALANCED INCOMPLETE BLOCKS 

By K. S. Banehjek 
Pwsa, Bihar, India 

1. Introduction. Following a paper by Hotelling [1] on the weighing prob¬ 
lem, Kishen [4] and jMoocl [2] furnished generalized solutions. This note consists 
of some additional remarks on the weighing problem when the iveigliing is re¬ 
stricted to be made on one pan, 

Hotelling remarked that when the problem was to determine a particular 
difference or any other linear function of the iveighls, a different design should 
be sought to minimize the variance. An account of efficient designs of this kind 
has also been furnished in thi,s note. The notations used by Hotelling and 
Mood have been used here 

2. Chemical balance problem. It has been shown by Mood that when 
iV = 0 (mod 4), an optimum design exists if a Hadamard matrix Hat exists, and 
IS obtained by using any p columns of Hk . When W s i (mod 4), ('1 = 1,2, 3), 
very efficient designs are obtained either by adding to or deleting from the rows 
of Hiz , making the resultant number of rows equal to N. 

It has further been shown by Mood in connection with this class of designs 
that arrangements^ are available which are more efficient than the one obtained 
by repeating the row of ones. As a matter of fact, if any row other than the row 
of ones be repeated, this will lead to a design of the same efficiency as in the case 
of repeated addition of the row of ones, for, the determinant of X'X will remain 
exactly identical That this is so, will be clear from the following properties 
showing the connection of the matrix X with the determinant [ a,; ]: 

(i) Any two rows of the matrix X can be interchanged without changing the 
determinant [ a,^ |. 

(li) Any two columns of the matrix X can be interchanged without changing 
the determinant [ Oij | 

(ill) The signs of all the elements in a column of the matrix X may be changed 
without changing the determinant 1 a„ ] 

3. Spring balance problem. Mood has exhaustively discussed the designs 
when N > p Efficient designs under this class will, however, be available from 
the arrgpgements afforded by balanced incomplete block designs discussed in 
[3], These designs will be represented by certain of the efficient submatrices of 
the Pk of Mood. 

Usually V and h are used to denote respectively the number of varieties and the 
number of blocks in the above mentioned designs. Here v will take the place of 

' This had been independently shown by me before the paper of A. M. Mood was brought 
to my notice by H Hotelling 


394 



WEIGHING DESIGNS 


395 


p, the number of objects to be weighed and h that of N, the number of weighings 
that can be made The matrix X'X in this case will take the form 


( 1 ) 


> XX X' 
X r X • X 
X X • X 


Lx XX • rJ 


The variance of the estimated weight of each of the p objects for such a design 
can be easily seen to be 


( 2 ) 


r + \{p ~ 2) 2 

(r - X)[r + X(p - 1)} 


for zero bias, 


where p is the number of objects to be weighed and r and X have meanings similar 
to those in connection with balanced incomplete block designs; that is, r is the 
number of times each object is weighed, and X is the number of times each pair 
of objects is weighed together 

Though the mimmun mimmonm of ajN can never be attained by the objects 
to be weighed under such designs, may however be kept as the standard 
with which the efficiency of a given design may be calculated. The efficiency 
of the above design will therefore for zero bias be 

(r - X){?* + X(j? - 1)1 
]V!r + X(p~2)} ■ 

The identities well known in the theory of balanced incomplete blocks. 


Wc = vr, X(d — 1) = r(fc — 1), 

may, upon replacing hhy N and ii by p to accord with the notation of weighing 
designs, be written 

r = Nh/p, X = r{k — l)/(p — 1) 

Upon substituting these in (3) we obtain the efficiency factor in the form 

/c^(p — k) 
pipk - 2k + 1) ’ 

where k is the number of plots per block or the number of objects that can be 
weighed at a time. 

If instead of adopting repetitions of Pk , only weighings be made in all, 

the efficiency factor calculated for such a combinatorial design would be 

{r - X)!?- + X(a) - 1)} 

5 {r + X(!i — 2)} 


for zero bias 



39G 


K, S. BANEHJEE 


where 



and b = . The aliove expression on simplification reduces to (4). 


It will be noticed that the efficiency of such designs depends only upon the 
total number of objects to be weighed and the number of such objects that can 
be weighed at a time. 

These designs have the advantage that all the weights are estimated with 
equal precision. If a slightly larger number of weighing than what is affoided 
by the number of blocks in a balanced incomplete block design has to be made, 
all the objects may be weighed together and this weighing be repeated as many 
times as required. This will be equivalent to the repeated addition of the row 
of ones. The repetition of the row of ones in particular is necessary to make the 
weights estimable wilh equal precision, which however, may be demanded at 
times as a matter of necessity in certain experiments. Otherwise, any other 
single row or different rows of the matrix X may be repeated, making the number 
of rows of the matrix X equal to the number of weighings proposed to lie made 
in all. 

From the practical point of view also, it will be advantageous to connect the 
designs for weighing with the already existing balanced incomplete block de¬ 
signs, which have been highly developed in recent jmars and arc being extensively 
used in agro-biological investigations. 

4. Spring balance design for small p. Under this class of designs, Mood has 
found the most efficient design for p = 7. It is given by 


Lt = 


•101010 r 

0 1 l 00 1 1 
0001111 
1100110 
0 111100 
l 0 1 1 0 1 0 
_1 101001 - 


This Li is easily recognized to be the design for fc = 4, h = 7, c = 7, r = 4, 
> = 2, given by an orthogonal series [3]. It is therefore seen that Hadamard 
matrices will lead to a new method of constiuicting balanced incomplete block 
designs of a certain class, For example Hw and Hm will lead respectively to the 
designs for fc = 8, b = 15, w = 15, r = 8, \ = 4 (or for k = 7, h = 15, v ~ 15, 
r = 7, X = 3) and for k = 10, b = 19, a = 19, r = 10, X = 5 (or k = 9, b = 19, 
y = 19, r = 9, X = 4). These designs also satisfy the condition of maximum 



■WEIGHING DESIGNS 


397 


efficiency, by virtue of the fact that \ Ln ) will have the value 


as shown by Mood. 


{N + 1 )/ 2 "', 


6. Detennination of a linear function of the objects. An orthogonalized 
design which is cent percent efficient to determine individually the weight of p 
unknown objects is not necessarily the design of maximum efficiency for the es¬ 
timation of a linear function of the objects. To illustrate this, let there be three 
objects, the weights 0 i, 0 i, O 3 , of which have to be estimated on a balance 
corrected for zero bias and let us, for this purpose, concentrate on the design 
characterized by the matrix given below. 


(5) 



As has been indicated in the previous papers, the variance of each of the unknown 
objects comes out to be itr*, which is the mmimum rmnimorwn and as such the 
above design enjoys the cent percent efficiency, when the question of individual 
estimation is concerned But in estimating a linear function of the objects, 
for instance the total weight, designs more efficient than this are available 
The variance of ZiOi + Wi + ZaOj is known to be 


(0) i: 

where C-i, denotes the elements of the matrix reciprocal to the matrix X'X 
As the above design furnishes the estimates of the unloio'wn objects orthogonally, 
the variance of the estimated total weight of the three objects will be given by 
fo-^ If, however, the design given by the matrix 


(7) 


11 r 
110 
101 
.011. 


be adopted, the variance of the estimate of the total weight may be easily seen 
to be (3/7)o-\ by putting k — k = k = 1. ( 3 / 7 ) 0 -“ is evidently less than fo®. 
Therefore with four weighings, the design characterized by (7) is more efficient 
in estimating the total weight than that characterized by (5) A still more effi¬ 
cient design for getting the total weight is simply to weigh all the objects to¬ 
gether four times 


6 , Designs with arrangements afforded by balanced incomplete blocks. The 
necessity for an efficient design to estimate any linear function of the objects 



398 


K. S. BANERJEE 


(or to be precise, say to estimate tlie total weight) will perhaps arise only when 
the objects cannot all be weighed at a time collectively on a single pan. Here 
also, an efficient design under the supposition that all the objects cannot be 
weighed together is afforded by the arrangements in balanced incomplete blocks. 
In such a design, the diagonal elements in the matrix reciprocal to X'X will be 
all positive and equal to 


( 8 ) 


r + X(p - 2) 

(r - X){r + X(p - 1)1’ 


while the remaining elements in the reciprocal matrix will be negative and equal 
to 


(9) 


_ ^ _ 

(r - X){r + X(p - 1)1 ■ 


Using the generalized form of (6) and admitting of the possibility that any of the 
arbitrary constants k may be negative, the variance of the linear function 
hO, may be easily seen to be 


( 10 ) 


' jZi_ Kn,? \ 2 

r-X (r-X){r+(p-l)X}/‘"' 


If, however, in the above expression, the coefficients U are equal to 1, (10) is the 
variance of the estimated total weight, and reduces to 


( 11 ) 


r + (p - 1)X ■ 


When there are N weighings in all, the minimum variance that can be reached 
is a /N and will be attained, it appears, only when all the objects are weighed 
together and the weighing is repeated N times The efficiency of a given design 
may therefore be calculated with reference to <//N, Remembering that the 
number of weighings takes the place of the number of blocks and p the place of v, 
the efficiency of the design will reduce to {k/pf, where fc is the number of plots per 
block i.e the numbei of objects that can be weighed at a time 
If, however, the combinatorial arrangement is adopted weighing all possible 


combinations of k objects and making weighings in all, the same efficiency 

as above will be obtained for such a design. 

Given k, the above expression of efficiency will therefore be the deciding factor 
for choice between an arrangement of balanced incomplete block design and all 
possible combinations of k objects. 


7. Design of maximum efficiency. Designs leading to the matrix X'X of 
the t 5 q)e (1) have certain advantages inasmuch as the variances of the individual 
objects are equal, as are also the covariances between all possible pairs. The 



WEIGHING DESIGNS 


399 


variance of the estimated total weight in such a design is given by (11). To 
minimize the variance thus obtained, the expression 

(12) r + (p- 1)X 

has to be the maximum for a given value of p. In an arrangement of the bal¬ 
anced incomplete block type or in an arrangement with all possible combinations 
of k objects being weighed at a time, (12) would reduce to rk and would therefore 
increase with the increasing value of rk This shows that the estimation of the 
total weight will have increased piecision if more of the objects are weighed at a 
time. 

If all the objects could be weighed at a time and both the pans be used for the 
purpose, some of the elements in the matrix X will be —1 instead of 0. This 
would increase the value of r but would decrease the value of X To devise the 
best possible design therefore, account will have to be taken simultaneously of 
r and X 


REFERENCES 

[1] HaboiiU Hotelling, "Some improvements in weighing and other experimental tech¬ 

niques”, AtwioIs o/MoWi. Stot , Vol 16 (1944), pp 297-306 

[2] A M Moon, "On Hotelling’s weighing problem”, A-nnois Slat ,Vol 17 (1946), 

pp 432-446 

[3] R, A. Fisnun and F, Yates, Stalisiical Talks for Biological, Agricultural and Medical 

Research, Oliver and Boyd, London, 1938, pp 10-13 

[4] K Hishbn, "On the design of experiments for weighing and making other types of 

measurements”, Annals of Math. Slat, Vol 16 (1945), pp 294^300. 

[5] C R Rao, “On the most eiSoient designs in weighing”, Vol 7 (1946),pp 440 



BOUNDS FOR SOME FUNCTIONS USED IN SEQUENTIALLY TESTING 
THE MEAN OF A POISSON DISTRIBUTION^ 

By Leon H. Herbach 

Brooklyn College 

fix, Xi) 


1. Introduction. Let z = log 


, where f{x, X^) = (e 

fix, Xo) 

ii = 0,1), is the elemental’}’’ probability law of a Poisson variate X, under the 
hypothesis that the mean is equal to X.. Without loss of generality we shall 
assume Xi > Xo. 

Let Ho be the hypothesis that the distribution of X is given by fix, Xo) • Wald 
[1, pp. 286-287] has devised general upper and lower bounds for the probability 
of accepting Ho , when X is the true value of the parameter, and the sequential 
probability ratio test is used. This probability is called the operating-charaC' 
teristic function and is designated by L(X). Using these results he has com¬ 
puted the bounds for the binomial and normal distributions [2, pp. 137-142], 
We shall do the same thing for the Poisson distribution, since the restrictions 
[1, p 284, conditions I to III] under which these general limits are valid can 
rather easily be shown to apply to the Poisson distribution, if we make the fur¬ 
ther restriction that Biz) ^ 0, 

These general results are 


1 -K 
SA'‘ - B>' 


< 1 - L(X) < 




A'* - 7iR''’ 


and 


( 1 ) 


1 - A'‘ 


< L(X) < 


1 - 


if li > 0, 


if /i < 0, 


B^ - r,A’' ’ 

where a, /3 are probabilities of committing errors of the first and second kind re¬ 
spectively and 

A = (1 - )3)/a, B = /3/(l - a) 


( 2 ) 


hz 


7] = gib 
5 = lub pAe"* 






> 4 


P/> 


and h is the non-zero root of the expression, Ee‘^ = 1. 
unlcnowns are and 5 


f > 1; 

0 < p < 1; 

Hence the orily remaining 


‘ The author is indabled to Professor A Wald for suggesting the problem which led to 
this note and for helpful discussions. 


400 



BOUNDS POE SOME FUNCTIONS 


401 


The following bounds to En^ the expected number of observations required 
by the sequential probability ratio test defined by a, ^ have been derived [1, pp. 
143-147]: 

L(X)(log 5 + ^0 + [1 — jb(X)] log A < ^ 

- Wz - > 

< L{\) log 5 + [1 - L(X)](log A + I) 

> Es 

the upper or lower inequality signs holding according as -Eg > 0 or iJs < 0, where 

(3) = Min E{z + r I g + r < 0), 

r 

and 

(4) ^ = Max E{z — r j s — r > 0), (?■ > 0) . 

r 

Using the limits to L(X), we then find ^ and which determine En. 

2. Special terminology. By an almost-increasing function we shall mean one 
that has the following properties: If x is any point of discontinuity, then (a) x + fc 
IS also where k is any integer and x + I is a point of continuity if I is not integral, 
(b) fix - «) < /(x - e') < fix) for 0 < e' < e < ], (c) fix - 1) < fix), (d) 
lim,_o/(x + e) = /(x +) < fix), (e) /(x — 1 +) < /(x +). It is clear that the 
minimum value for fiy) in any closed interval [o, b] is equal to min [/(o), /(a' +)] 
where a' is defined as a if the closed mterval contains no discontinuity, and as 
the leftmost point of discontinuity otherwise As special cases, if a is a point of 
discontinuity this minimum is /(a +) and ifx<o<&<x4-l the minimum 
is/(a). 

Almost-decreasing functions are defined similarly except that the inequalities 
go the other way. In this case the maidmum in the interval is max[/(a), /(a' +)] 
and we have special cases as above 

3. The case /i > 0. Since e” = where a = Xi/Xo and c = (Xi — Xo) the 

condition e^‘ < 1/f maybe expressed as < 1/f, whence 

(5) X < c/log a — log ^/ihlog a) = s - r (say), 

Since x > 0, r < s. Hence 0 < r < s. Also 

(6) Ee"’' = 2 i^~‘ a*)'‘ ^^ = exp (-c/i - X + Xa'‘), 

s!=.o x! 

and 

(7) fE(e^^l e'" < 1/f) = mie~‘af |a: < s - r] 



402 


LEON U. HESBACH 


From (5), f and (7) becomes 


(7.1) 


[«-rl -X 

E C A -^ch xh 

—j-c a 

Th XmQ X' 

" f-ig-XX* ’ 




where [s - r] is the largest integer < (s — r). Our problem is to minimize (7) 
with respect to f. Since r is a strictly increasing function of f, this is equivalent 
to minimizing d^G/D = 6 (say) with respect to r, where 


0=2 — , 
XmO ^ 1 


and 


D - E 

z«o X ! 


It will be shown that (7 1) is an almost-increasing function of r and therefore 
the minimum occurs at either r = 0 or r = r -h, where v = s — [s], since the 
saltuses occur at r = v -b for fc = 0,1, 2, • • • , [s]. 

Since d'' is an increasing function of r and C/D remains constant as long as 
[s — r] remains constant, condition (b) is fulfilled, 

Conditions (c) to (e) refer to the saltuses only, hence, to show them, we may 
assume, without loss of generality that r and s are integral. We proceed by in¬ 
duction, using the notation d(w) to mean the value of 6, when r = w, to show (c). 
First we prove the following: 

Lemma A e{s) > d{s - 1). 

Proof; Since we assumed Xi > Xo and h > 0, > 1. Hence (1 + \)ct > 

1 -t- Xa'*, whence, a fortiori, d'' > -f Xa‘)/(1 + ^)- 

To show that if 0(r -f 1) > 0(r), then 0(r) > 0(r — 1), we shall show that 

(8) CD -t- < GDd' + Gb 

implies 


(9) CD + Dhqd^'^^^’' < CDd -f Cbqd, 

where n = s - r,b = \’'/n\, q = X/(n. + 1). 


Since, as we shall see below, 

(10) Dbd^+^% - 1)< Cb{qd - 1), 


or 


( 11 ) Dd”^^% - 1 )< dqd - 1 ), 

addition of (8) and (10) yields the desired result, (9). 

It now remains to prove (11) or that 



( 12 ) 



BOUNDS FOE SOME FUNCTIONS 


403 


Setting (6) equal to 1 we get Xa'” = ch + which when substituted in (12) 
yields 


■ (oh -j- X)* 


{ch + X)"'-'(X - n ^ 1) E ^ < r*\ch + X - n - 1) E- , 

X\ s=.0 X\ 


Upon letting p = c/i + X, we have 


X - (n + 1) V ^ P — (ft + 1) V ^ ^ w(„\ 
^ hx\ 


X"+i 


XnQ X I 


say 


Then our problem reduces to showing that Fiy) is increasing inO<X<y<p 
or that the derivative with respect to y, F' (y) is positive. 


F>{y) = - E 


(n - x){f-^-^) 


+ (ft+ DE^"^" 


■') 


:i (rr + l)! 


+ (n + 1) y 


2. .-n~2 


>(n + ify " ^ smce (w + 1) > (a; + 1); 


> 0 since y > 0. 


Thus condition (c) is demonstrated. To show (d) we must show that 
eir +) < 6{r), which means that 


C - bo”'* 
D - b 


a 


c 

D • 


But this IS true if U < 
to showing that 


Do”'* which is easily verfied. Condition (e) is equivalent 


C . rKC- boT' 

“ 5 TTrir 


which IS proved just as (c) was. 
Hence, 


7) = mm s e 


(13) 




1-0 


a:' 


„hx / [a] 

/ XeeO • 


[s—l] -\.x hx / ["-U 

-^h e X o / 6 X 


_,h -ch y e A u / Y 

x\ / *=0 


a e 




x\ 


As special cases we have (i) if s is integral, v is the latter with v = 0 and (li) 
if s < 1 (b) IS the only applicable condition and we have an ordinary increasing 
function, hence y is the former. 

Similarly, it may be shown that 

(14) S = max [e-“'‘D(a"'*| a; > (s}), o"'‘'*r''‘D(fl*'“ l.'c > {s + 1})], 

where (s) is the smallest integer > s and a = {s} — s. Here there is only one 
special case, namely (i). If < 0, 5 is the larger of the two expressions on the 
right side of (13) and y is the smaller of the two corresponding expressions in 

(14). 



404 


LEON H HEBBACH 


4. Since z = — c -t- x log o, f may be written 

Max log aE{x — f | > 0, 

i 

where t = {r + c)/(log a). Hence s = c/log a < t < m. Therefore if we can 
show that E{x — t\x > t) = y(t) (say), is an almost-decreasing function of t 
we will know that ^ occurs either when f = s or {s} -t- since, as will be seen, the 
jumps occur at integral t 

To show (c) we make use of the following which is easily proven; 

Lemma B Let X, Y, Z each be greater ttian zero. Then a necessary and suffl- 
X X 1 Y 

cient condition that ^ < -—|—- is that XZ < 

Y Y Z 

Therefore, to show for integral t that 
( 15 ) 7(0 < yit - 1 ), 

or that 


Jlix - i) - Z) (a: - 0 -| + S —I 
x-i X’ ^ a:' i_( a;! 

Ma* ^ oox® V 1 > 

V — V iL 4- ^ 

ittxl ^tx! (f - 1)1 


we need only show that, for all integral I, 

^ ^ («-l)l£l a:! ^ l^tx\j' 

Since both sides of (16) are power series in X where the exponents start with 2t 
we need only show that the coefficient of every term on the left is less than the 
corresponding term on the right. 

In the case of the coefficient of (j > 0) we have to show that 

_ 2i + 1 ^2 2 

(t -f 2j + l)l(i - 1)1 {t+ 2j)m "^ (< + 2j - l)l(i -f 1)1 


+ ■• + 


{t +J)\{t + j)'’ 


or by multiplying both sides by {2t 4- 2j )' that 


(2i + 1) 


2< -|- 2j 


2t -f 2j 


2t -f 2j 

.t+ 1 


-j- ■ ■ -f 2 


2t + 2j ' 
t+J~ I 


/2t -H 2j\ 

-h = M, say 

\t + j / 

Replacing aU the binomial coefficients on the right by the smallest one we have 



BOUNDS BOR SOME BUNCTIONS 


405 


since 


< for n > 2s 


Thus the truth of (16) has been established 


for even exponents The odd terms aie treated similarly, 

Hence, we have shown that 7(0 is a strictly decreasing function of t, if I takes 
on integral values only. We shall now show (b),. i c, that 


(17) 


Y^{x ~ t) — (3; _ (-i_ 

/A ^ a;=^[ t—c) X' 

7(0 ~ ^ ~ — 7(^ ~ ^)' 




» \X 

E ^ 


The denominators are equal and each term of the numerator on the right is 
greater than the corresponding term on the left, hence (17) is valid, 

Conditions (a) and (d) can be shown, by showing in a similar manner, that 


(18) y{t +) = 1 + 7(1 4* 1) 


and 7(0 > 1 + 7(^ + 1) for integral t. By using (18) for i and 1 - 1 together 
with (15) we show yit - 1 +) < y{t +), which is condition (e). Thus we have 
shown that 


^ = max 


, , aA e / V' A e 

-C + Iogo L -n- ^ —r> 

j;=.la) X\ / I=|,l X\ 


log a 


-{s| + S 


r —X 

xh e 


M .,1 — X "1 


E 


X e 


^(■+1) / 3!;=|fl-)-l) (c! J 


As in Section 3, is the lower analogue of i.e. 


k' = min {- c -\- E{x\x < [s]), - [s] log a + £(x ] x < [s - l])j, 


and the special cases are as in that section, 


EEPERENCES 

[1] A Wald, “On cumulative sums of random variables,” Annals of Math. Stat., Vol. 15 

(1944), pp 283-296 

[2] A Wald, "Sequential tests of statistical hypotheses,” Annals of Math. Stat., Vol. 16 

(1946), pp 117-186 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


THE DISTRIBUTION OF STUDENT'S t WHEN THE POPULATION MEANS 

ARE UNEQUAL 

By Herbebt Robbins 

Department of Mathematical Statistics, Universtiy of North Carolina 

Let ail, ■ • ■ , xjr be independent normal variates with the same variance a 
and with means m ‘ , pn respectively. Set n = — 1 and let 

(1) » = £ Xi/N, = 2 {x, — xf/n, t = x/s. 

1 1 

If all the Hi are 0 then t has Student’s distribution with n degrees of freedom; its 
frequency function will be denoted here by 

(2) fnM) = 5 Q, • (1 + 

When dealing with situations involving mixtures of populations or in which the 
mean exhibits a secular trend, it is important to know the distribution of t 
when the p, are arbitrary; in the general case let 

H = = 11 (pi - pf/N, 

(3) 1 1 

a = Npy2c\ X = Nfiy2<r\ 

The distribution of t will be shown to depend on the three parameters n, a, X 
If X = = 0, so that all the pi are equal, then the distribution of t determines 

the power function of the ordinary t test. We shall here consider the case in 
which a = p = 0, although the p, are different. Denoting the frequency function 
of t in this case by fn,\{t) we shall show that 

(4) fn,\{t) = /„,o(i) • exp • H-h n/2, - X(1 + t^ny^), 

where F denotes the confluent hypergeometric series, and where, since m = 0, 

(5) X = 2 pV^a 

1 

In fact, the general distribution of t, of which (4) represents the case a = 0, 

406 



stxjdent’s t 


407 


may be derived as follows. Using the standard orthogonal transformation 
[1, p. 387] let 

If N 

(6) ^ij j ~ y ' Cji 


3=1 


3=1 


where 

(7) 
then 

( 8 ) 


Cij 




(^ = 1, • •, i^), 

U = 


t = n*zi / 


The joint frequency function of the z, is easily seen to be 


(9) 

where 

( 10 ) 


(271)-'^'^ - • exp I - Z ( 2 , - aW , 


ai 




Thus t is the ratio of a non-central normal variate to the square root of an in¬ 
dependent non-central chi-square variate. It is known [2, p, 138] that the 

frequency function of g'* = Z is 


( 11 ) 

where 

( 12 ) 




(IXg^)’ 




X = Z aJ/Sv = iV/3V2o■^ 


The frequency function of i; = zi/a is 




(ffV - tti)^' ] _1 

2/ 


_ -a -( 1 , 2 / 2 ) Y' (2ay'^ j, 

V 2ir k=o k ! 




that of q is, by (11), 
hence that oiu = v/q = n is 

[ h{q)g{uq)qdq, 

Jo 

which, after integration, reduces to 

r-to\ _-i „-»+“) X^(2 q!*«)*' r(iV/2 -j- j - H fc/2) j_ ^^2,^_(2f+2,+«/2 

tioj TT e ^ ^ ^.,,^1 n/m /O J_ t T / 


J=0 1=0 


jifci r(n/2 + j) 



408 


HEREEUT nonniNS 


In particular, if a = p. = 0 then (13) reduces by means of the relation 
F{a,y,x) = e^F{y — a, y, — x) to 


(14) 


R f ^ 


-Xu2/(l+uS) 


(1 + ul 


,zn-1jv 


F I - 
' 2 ’ 2 ’ 


X(1 -|- u^) 


from which it follows that the frequency function of i is given by (4). 

Again, let a:i, • • • , iwi+y, be independent normal variates with the same vari¬ 
ance and with means in • , hhj+n^ respectively. Set ni = Nx — 1, 
Hi = Ni — I, n = nx + Ui, and let 


xx 


Vi 




= £ X^/N2 


iVi+1 


(15) s? = £ (a:. - XiY/ui, s? = 2 i^r — I'M 

X ATj+l 

i = (riiSi 4- 'M^\]/{nx + 'n.l), i — [iViA 2 /(Ai 4- A 2 )]*(£i — xl)/s. 


If all the p, are equal then t again has Student’s distribution with n degrees of 
freedom. In the general ease let 


Ifi WiH-Wa 

pi = X)n./Ai, /ij = £ m./A’j, 


(16) 


Wl+l 


1 — £ (fR — fiiY/Ni, 


Nl+Ni 

= £ (m. - pD'/n,. 

Kl+l 


Then we may show as before [1, p, 388] that in this case u = n~^t has the fre¬ 
quency function (13), where now 


(17) 


N = Nx + N, - I, X = {NxPl 4- 

a = [NxN,/{Nx 4- N,)]iPx - Plfl^\ 


In particular, when a = jii — jis = 0, so that px = p 2 - P, say, the frequency 
function/„,;,(«) of i is again given by (4), where now 


(18) 


X = 



1 


Extensions in this direction to the general linear hypothesis in the analysis of 
variance will not be treated here 
If we set 


(19) 


w = (1 4- t^/n) ^ 


where t has the frequency function (4), then w will have the frequency function 


( 20 ) gn,\{w) 



a 


w 


,i«—1 


(1 - wr* ■ F 


/ 1 n 
\ 2’2 




student’s t 


409 


for 0 < w < 1- Thus for every t, 

(21) 1 — I /n.x(a:) dx = I dw. 

I Jq 

It would be interesting to have numerical values of the integral on the left side 
of (21) for that value of t for which 

(22) ^ ~ f (it = 001 or 0.05 (say), 

but existing tables (e.g, those in [2] and [3]) of the integral of (20) were compiled 
for a different purpose and do not supply this information. The following re¬ 
marks throw some light on this subject 
Let us set 

R(t) = /«.x(0/A,o(0 = exp|^-=^j . F (-i, - X(1 + 

(23) = (1 + t^/n) -f- o(X)} 

• (1 -f X/(n -f i*) -h o(X)) 
= 1 -h X(n ^(i — -j- o(X). 

Then as X 0 we have ultimately 


(24) 


Rit) > 1 if 111 < 1, 
R{t) < 1 if I«I > 1 


Hence for any t > 1 and for sufficiently small X, 


(25) 


1 



/„,x(x) dx < 1 - 



dx. 


The exact range of values of t for which R{t) < 1 depends of course on n and 
X. However we shall show that always 

(26) h;(0 < 1 if 1«1 > 1. 


so that (25) holds for all n and X > 0, provided i > 1 The proof is as follows. 
In terms of w we have 


(27) R{t) = e F{-^, n/2, — \w) = e ^F{{n + l)/2, n/2, 'Kw). 
Now 

F(in -h l)/2, n/2, Xio) = 1 

(n -f- l)(n 4- 3)‘ • in + 2k — 


(28) 



410 


LOUIS QUTTMAN 


and by induction on k we may show that for all fc = 1, 2, ■ • • , 

(29) (rtjfl)(n + 3)---(n+2fc- 1) 

where the equality holds only for fc = 1. Hence 

(30) Fi{n + l)/2, n/2, Xvi) < 1 + 2 (1 k/n)- {\w)’‘/k^ = e^“(l + \w/n), 

(31) Rii) < 6“’'''"“' • (1 + \w/n) < 

Hence R{t) < 1 if w < n/(n + 1), which is equivalent to (26). 


REFERENCES 

[1] H. CuAMliini Math&mahcal Methods of Slalishcs, Princeton University Press, Princeton, 

1946 

[2] P. C. Tang, “The power function of the analysis of variance testa with tables and illus¬ 

trations of their use,” Stat Res Memoirs, Vol. 2 (1938), pp. 127-149, 

[3] Emma Leiimbu, “Inverse tables of probabilities of errors of the second kind,”A 7 i?io!s 

of Math Slat ,Yq\ 16 (1944), pp 388-398. 


A DISTRIBUTION-FREE CONFIDENCE INTERVAL FOR THE MEAN 

By Louis Guttman 
Cornell University 

1. Summary. Consider a random sample of N observations xi,X 2 , ■ ■ • ,xn, 
from a universe of mean y and variance v* Let m and be the sample mean 
and variance respectively: 

(1) m i aii, s* = -^ X) 

iV Jy 1=1 

It is shown that the following conservative confidence interval holds for ju: 

(2) Prob {(m - p)' ^ sV(iV - 1) -|- \aVWiN ~~1 )} > 1 “ '>C\ 

where X is any p ositive c onstant Inequality (2) also holds if, in the braces, X 
is replaced by \/X“ — 1, with X ^ 1. 

Inequality (2) is much more efhcient on the average than Tchebychef’s in¬ 
equality for the mean, namely, 

(3) Prob {(m - yf ^ \V/N\ > 1 - X~^ 

yet (2) and (3) are both distribution-free, requiring only Imowledge about 
At the 1 — X = .99 level of confidence, the expected value of the right member 
in the braces of (2) is only about 1/6 the corresponding member of (3); at the 
.999 level of confidence the ratio is about 1/20. 



CONI'IDENCE INTERVAL FOR MEAN 


411 


A more general inequality than (2) is developed, also involving only the single 
parameter a. 

2. Derivation. Consider the function 

(4) u = {m - fif - i/{N - 1) - C(r®, 

where c is an arbitrary constant. It is easily verified that Eu = — cc, and that 

(5) = AVN{N - 1) + c\ 

A basic feature of (5) is that the only population parameter in the right member 
is (/. Contrary to what might have been surmised, the fourth moment of x 
about /I is not involved, and indeed need not exist. 

According to Tchebychef’s inequality, 

(6) Prob {— Xy/Eu^ ^ u ^ Xs/Ev}} > 1 — X’'“, 

where X is an arbitrary positive number. Using (4) and (5), it is possible to write 

(6) as; 

Prob {sV(Ar - 1) + ctr' - X<rW‘2/N{N - 1) + ^ (m - iif 

(7) _ 

g sV(JV - 1) + <r“[c + X\/2/N{N - 1) + c^]} > 1 - x““. 

In the braces of (7), if the left member is negative, there is no harm in replacing 
it by zero, if it is positive, then replacing it by zero may only increase the prob¬ 
ability of the braces. Regardless of the value of this left member, it is true that 

Prob {(m — nf ^ s^/{N — 1) 

-h c\c -t- Xv'2/A(A - 1) + c']) > 1 - x~“. 

If we set c = 0, we have inequality (2). Some improvement over (2) is obtained 
by determining c to minimize the right member in the braces of (8), yielding as 
the shortest confidence interval: 

(9) Prob {(m - nf S sV(fV - 1) + o-* y/2{X^ - l)/N{N - 1)} > 1 - X'*. 
Inequality (9) differs from (2) only by replacing X in the braces by Vx^ - 1- 

3. Comparison with Tchebychef’s inequality. The expected value of the 
right member of the braces in (2) is 

(10) v'[l/iV -h XV2/N{N - 1)J. 

The ratio of (10) to the corresponding value of Tchebychef’s inequality (3), 
namely xVVA, is 

(11) [1 + XV2N/iN - 1)1/X“ 

Since (11) decreases as X increases, the efficiency of inequality (2) increases com¬ 
pared with that of Tchebychef as the level of confidence 1 — X"^ increases The 



412 


LOUIS GUTTMAN 


squared interval of (2) involves only the first power of while that of (3) in¬ 
volves the second power 

4. Approach to normality. If the fourth moment of the universe’s distribu¬ 
tion exists, then it is well known that the ratio of E{m — to must ap¬ 
proach 3—the ratio for the normal distribution—as N increases. That is, if 
a* + 1 is the ratio, then limAr-,M a = 2 It is known^ that Tchebychef’s inequal¬ 
ity can be replaced by one involving both a and o-^, and tliat 

(12) Prob {(m - g ff (1 + \a)/N] > 1 - 

If a = 2, then the right member in the braces of (12) becomes (r“(l + \\/2)/N 
This is virtually the same as (10), the expected value from (2). In a sense, then, 
(2) implicitly takes account of the fact that the distribution of sample means 
approaches that of the normal distribution with respect to the fourth moment, 
A striking feature, however, is that (2) holds for any iV > 1 and does not even 
presume the fourth moment of the universe to exist, whereas to set a = in 
(12) in general requires a large N and finite universe fourth moment 

6. Further possibilities. Confidence interval (2) is derived from but one 
of a series of general intervals, each of which depends only on cr^ It may be pos¬ 
sible to derive from this series even more efficient intervals, according to the 
method now to be outlined 

One way of arriving at (2) is to consider all products of the form (.r, — m) 
(xj - ju), where i > j and i, j = 1, 2, • • • , iV. Let pa be the mean of these 
N{N — l)/2 products It can easily be seen that p 2 = u in (4) with c = 0, 
so that Pi is a second degree polynomial in m — p, the coefficients being sample 
statistics A more general quadratic would be ih = pi -{■ Cipi T Co, where Ci 
and Co are arbitrary constants and pi is the mean of the N values (x, — m) or 
pi = m — p. It is easily seen that Epi = Ep^ = Epipi = 0, and that the only 
universe parameter involved in Epl BndEpl is Hence the only universe pa¬ 
rameter upon which v,l depends is also ar^ 

Higher degree polynomials in m — /x can be defined, possessing the same 
properties as ik Let pa be the mean of the Ar(iV — 1){N — 2)/31 products of 
the form (x, — p){x, — p){xk — p), where i> j > li and j, /c = 1, 2, • • • , iV; 
etc., and let p^ = (xi — p){xi — p) {xtr — p). Set po = 1, and let 

n 

(13) Wn == 2 Can Pa (tI = 1, 2, ■ • , A), 

where the Con are arbitrary constants. It is easily seen that Epa = 0 (a > 0), 
Epuph = 0 (a 7^ b), and that each Ep\ depends on only the parameter a as far 

‘ See, for example, Louis Guttnian, “An inequality for kurtosis,” Annals of Math 
Slat., Vol, 19 (1048), pp, 277-278, 



CONFIDENCE INTERVAL FOR MEAN 


413 


as tliG universe is coucernGd.. HeucG depends only on Furthermore^ by 
■writing x, - M as {x, - m) + (m - n), it is seen that pa is a polynomial of degree 
a in m - M, the coefficients being sample statistics. From ( 13 ), then, Un is a 
polynomial of degree n m m ~ p with statistics as coefficients 
According to Tchebychef’s inequality, 

(14) Prob {ul g X^Eulj > 1 - 

The interval for ul in the braces can be expressed m two statements; 

(15) /n(ni - p) = u„ - X'S/Eun g 0, 

(16) QnOn — p) = Un + XX/EuI 0 

Both /n and are polyno mials of degree n m m — p, exceeding /„ always by 
the additive constant 2xVEul Let q„ and be the smallest and largest real 
zeros respectively of /„ , and let r„ and fi!„ be the smallest and largest real zeros 
respectively of g^ 

For convenience, we can suppose that c„„—^the coefficient of {m - pY in w„— 
IS positive. If n is even, then/„ is positive for m — > Q„ and for m — ,u < j-*. 

Hence the interval Qn ^ m — p ^ Qn contains all the points included in (15) 
and possibly more. Since the probability of (15) is not less than the probability 
of (14), we can write the following confidence interval ■ 

(17) Prob {qn ^ m — p S Qn] > I — X~^ {n even). 

The problem remains to determine the Ca„ so as to minimize the expected value 
of — qn Inequality (9) provides the minimum for the case n = 2 This 
can be verified by adding the term cipi to u in (4) and finding that the minimum 
requires Ci = 0. 

If n is odd, we again may set > 0. Then fn > 0 for m — p > , and 

Qn < 0 for m — p < . The interval r„ ^ m — p ^ Q„ thus contains at least 

all the points found jointly in (15) and (16) and hence forms a conservative con¬ 
fidence interval; 

(18) Prob {rn ^ m — p ^ Q„} > 1 — X ^ (n odd) 

Again, the problem is to determine the c™ so as to minimize the expected value 
of Qn - Tn Tchebychef's inequality (3) does this for the case n = 1 
Although the only population parameter involved throughout is Y, the sample 
moments up to the nth order are present m (15) and (16) It thus seems plau¬ 
sible that improvement over inequality (9) should be possible for n > 2 To 
obtain such an improvement requires developing a distribution-free theory of 
the zeros of /« and gn beyond the quadratic case 



414 


]0 GA.NHADO MACICDA 


ON THE COMPOUND AND GENERALIZED POISSON DISTRIBUTIONS 

By E. Oansado Maceda 
University of Madrid 

1 . Summary. In this note wo deduce several propertie.s of the compound 
and generalized Poisson distributions, in particular their closure and divisibility 
properties. An infinite class of functions whose members are both compound 
and generalized Poisson distributions is exhibited, and .several of the distributions 
of Neyman, Polya, etc. are identified The present note stems from a paper by 
Feller [2] 

2. The compound Poisson distribution. If F{x\ a) is a family of distribu¬ 
tion functions depending on the parameter a, and 17(a) is a distribution function 
such that it assigns zero probability to any a domain for which F(.r|a) is unde¬ 
fined, then 

Oix) = [ Fix I a) dUia) 

J—eo 

is a distribution function In particular if F(x]a) is the Poisson distribution 
with mean a, and 1/(0) = 0, G(x) is called the compound Poisson disinhuim 
associated with the distribution function U{a), cf Feller [2], Clearly Gix) is a 
step function over the non-nogative integers, the saltus at the point x = n being 

= r e~‘‘-,dU(a), n = 0, 1,2, •• 

Jo ni 

It is convenient to introduce the factorial moment generating function 
(f m.g f.) for Gix) as follows 

co(z) = F((l -i- zT) = L x„(l + 2)” 

n»0 

= f e+“'d?7(a) 

Jo 

= 4>i^) 

where <t>iz) is the ordinary moment generating function (m g f.) for Uia) This 
gives a convenient relationship between the moments of 17(a) and its associated 
compound Poisson distribution. 

On account of the multiplicative properties of uiz) and <#>(«) under the convolu¬ 
tion of Gix) and Uia) respectively, it is seen that the compound Poisson dis¬ 
tributions form a closed family, and if Giix) and G^ix) are two compound Poisson 
distributions associated with Uiia) and U^ia) respectively then Giix)jfGiix) is 
associated with Uiia)i,Uiia). In addition, if Uia) is infinitely divisible (cf. 
Cramer [1]) then Gix) is also, since it can be factored into the convolution of 
arbitrarily many compound Poisson distributions. 



POISSON DISTRIBUTIONS 


415 


Choosing in particular Uia) as the Pearson type III distribution, the asso¬ 
ciated function is the Polya-Eggenberger distribution, and if U{a) is a Poisson 
distribution the associated function is the Neyman contagious distribution of 
Type A. 

3. The generalized Poisson distribution. If F(x | a), defined for non-nega¬ 
tive integers a = 0, 1, 2, • • • , is the a-fold convolution of a given distribution 
F{v) with itself, i e. F(x\a) = F(x)*'‘, and TJ{a) is the Poisson distribution with 
parameter a, then the distribution function 

G(%) = f Fix\ a) dVia) 

Jq 

is called the generalized Poisson distnbuhon associated with F{x). 

If 0 ( 2 ) is the f.m.g.f of U(a) then for the f m g f of G(x) we have 

11=0 n' 

CO 

It follows that 01 ( 2 ) can be written as H <j>,{z) where Uv{z) is a generalized Pois- 

v-b1 

son distribution, and thus w( 2 )'belongs to the infinitely divisible family, More¬ 
over, if Gi{x) and Gi(x) are two generalized Pgisson distributions associated with 
Uiia) and TJiia) with parameters ai and 02 respectively, then G{x) = Gi{x)^:Oi{x) 
has for f.m.g.f 

a)i( 2 )w 2 ( 2 ) = exp|(ai -f at) ^ 

and G(x) is again a generalized Poisson distribution function associated with 
the distribution 

, . ai Uiia) -f- 02 Ut{a) 

U{a) = -y—- 

ffli -r 02 

and with the parameter ai + at Thus the generalized Poisson distributions 
form a closed family. The analytic nature of the generalized Poisson distribu¬ 
tions have been studied by Hartman and Wintner [3]. As noted by Feller [2] 
the various Neyman contagious distributions are generalized Poisson distribu¬ 
tions. 

4. Further remarks. From the above observations it is clear that a necessary 
and sufllcient condition for a distribution to be a compound Poisson distiibution 
is that its f m g f. be of the form 


( 1 ) 


o)i(z) = 4>{^) 



41G 


CiOT'I'FRIED H. NOETIIEU 


where (}>{z) is the ordinary ra.g.f. of a non-nf-gativc random variable. Likewise a 
neces.sary and sufficieiil, condition for £o(z) to tie the f m.g.f of a generalized 
Poisson distribution is that it bo of the form 

(2) £oj(2) = a > 0, 

where il{z) is the f.m g f. of an arbitrary distribution function P(,r), If we 
choose <^>( 3 ) = and Q(z) = e", then wi(z) = 102 ( 2 ), and the distribution 

whose f m.g f. is ui(z) (the Neyman contagious distriliution of Type A) is simul¬ 
taneously a compound and a generalized Poisson distribution (cf Feller [2]) 
We now show that there ia an infinite class of distributions with this property, 
First note that if is the m.g.f of an arbitrary distribution, then exp 
la{4>(z) - 1) I IS also the m g f of a d.f., and in fact is the m.g f, of the generalized 
Poisson distribution associated with the distribution whose m.g.f. is (t>{z) Now 
jet <j)(z) be the m.g.f. of an ariiitrary non-negative random variable, and define 

(3) 01 ( 2 ) = exp{a((f)( 2 ) — 1)1 a > 0. 

Then u(z) is simultaneously of the form.s (1) and (2), since <^( 2 ) is, by (1), also 
the f.m.g.f. of a distribution function, i.e. the compound Poisson distribution 
associated with the di.stribution whose m.g f is (Piz). However, not every dis¬ 
tribution which ia both a compound and a generalized Poi.sson distribution can 
be generated in this manner For example, the Polya-Eggenberger distribution 
is easily shown to be both a generalized and a compound Poisson distribution, 
yet its f m.g.f. 

01 ( 2 ) = (J - ^ 2 )-'*'“, d > 0, > 0, 

h 

manifestly is not of the form (3), since this would imply 0 ( 12 ) ~ ^ ^ 

(1 — diz) is a characteristic function. But 1 1 is unbounded as 2 —> ± « and 
thus is not the characteristic function of a distribution. 

REFERENCES 

[1] H CramiSh, “Probicma in probability theory,” Annuls uj Math Slal , Vol. 18 (1947), 

pp 16,5-193. 

[2] W Feller, “On a general class of contagious distributions,” Annals of Math Slat,, 

Vol, 14 (1943), pp, 389-400. 

[3] P Hartman anu A Wintnbb, "On the m&nitosimal geneiators of integral convolu¬ 

tions,” Am Jour, of Math , Vol. 04 (1942), pp 272-270 


ON CONFIDENCE LIMITS FOR QUANTILES 

Bt Gottpbied E. Noethee 
Columbia University 

In finding confidence limits for quantiles it is usual to determine two Older 
statistics Z^ and Zj which with a given probability contain the unknown quantile 



CONFIDENCE LIMITS 


417 


between them. The values of ^ and j corresponding to a given confidence coeffi¬ 
cient can be determined with the help of the distribution laws of order statistics 
aS IS shown, e g., in Wilks [1] The purpose of this note is to determine i and j 
with the help of a confidence band for the unlmown cumulative distribution 
function. 

In what follows we shall always denote the cumulative distribution function 
(cdf) by F{x), i.e., F(x) = P{X < a;). Then the quantile Qp is determined by 

(1) F{qp - 0) < p < F{qp) 

which reduces to 

(T) Fiqp) = p 

if F(.'c) is continuous Given a .sample of size n we can construct the sample 
cdf Fn{x) defined by F„(a;) = 1/n (number of observations < x) Confidence 
coefficients will always be denoted by 1 — a 
Assume that we can construct two step functions L(x) and Uix) parallel to 
Fn{x) such that for any fixed value x 

(2) P{L(x) < Fix) < Uix)} =1 - a. 

We do not require that the confidence band determined by Lix) and Uix) cover 
the graph of the unknown cdf Fix) with probability 1 — a, but only that for any 
arbitrarily chosen value x (2) is true 
Lot 

L(«) = Vk, Uix) = dk 

for 3 k < X < Zk+i , fc = 0, 1, • ,n where zi is the value taken by the order 
statistic Zi and zo = - , 2n+i = + «= Then if Fix) is continuous it follows 

from (2) that a confidence interval with confidence coefficient 1 — a for qp is 
given by 

(3) Z,<qp< Z, 
where i and j are determined by 

(4) < p, > p 

(5) Vj-1 <V, V,>P 

It will be noted that (3) represents a half-open interval. However as long as 
we only admit continuous cdf’s the confidence coefficient is not changed if we use 

(3*) Zt qp Zj 

or 

(3") Z,<qp< Z, 

instead. This is no longer true if we also admit discontinuous cdf s. Then the 
confidence coefficient connected with (3') is ;< 1 — a, while that connected with 



418 


GO'Xn'FMKD E. NOETHER 


(3") is >1 — a, as follows immediately from consideration of the possible out¬ 
comes when (1) is tiue. This is the same result as that obtained by Scheff6 
and Tukey [2] 

We shall now indicate hoiv Tik and 8^ can be obtained and find their values in a 
particular case. For any arbitrai-y value x w'c can consider F„(.t) as the sample 
estimate of the unknown parameter p = F{x) of a binomial distribution Clop- 
per and Pearson [3] have discussed how confidence intervals for the unlcnown 
parameter of a binomial variate can be found. Thus wo can determine in 
and dh correspondingly, but as is well known (2) cannot be achieved with prob¬ 
ability exactly equal to 1 — a. We shall have to be satisfied with probability 
>1 — a. Consequently the same will hold true for the confidence coefficient 
connected with the confidence interval for Qp. 

In many cases central confidence intervals seem to be more desirable, at least 
intuitively, than others. Our method produces such central confidence intervals 
for the unknown quantile if wc use central confidence intervals in the construc¬ 
tion of the confidence band, In that case in and 6*, are determined by 


(6) 

2 = /„(/c, n-k + 1) 

(7) 

^ = h-ikin — fc, fc -p 1) 


except that ijo = 0 , 0 n = 1 by definition, where 

Up, 3) = r (’’Ul - dtl [' dt 

Jo Jo 

is the incomplete beta function. Scheff 6 [4] has pointed out how the tables of 
percentage points of the incomplete beta function by 0. M. Thompson, etc, 
[5] can be used to find ij*, and 6k . 

We shall show now that in the case of the median M the solution based on 
(3)-(7) leads to the same confidence interval as that suggested originally by W. 
E. Thompson [6], Thompson found that for /c < n + 5 

( 8 ) P{Zk <M < Z^k+i} = 1 - 2Zj(n - k + 1, k) 

provided the unlcnown distribution had a continuous cdf. ( 8 ) can be used to 
maximize k under the condition that the righthand side is > 1 — a. 

We shall first show that our method leads to the same land of a confidence 
interval, i.e., one with i = I, j = n — I + 1 . This follows immediately from the 
fact that by ( 6 ) and (7) 

(9) 1-01 = V.-1. 

For let 

( 10 ) 01-1 < I and 0 i > i, 
then by (9) Vn-i < h and i 7 „_i+i > 



A LOWER EOtTND 


419 


It remains to be shown that 1: as determined by (8) equals 1. This will be so 
if we can show that 

( 11 ) ~ ^ + 1 , 0 < liin — I, I 1) 

Remembering that q) is a monotomcally increasing function of x we get 
with the help of (7) and (10) 

2 = — I + 1,1) > Ii{n ~ I 1,1) 

and 

g -1,1 + 1) < liin -1,1 + 1) 

which proves (11). 

In conclusion it may be worth while pointing out that the formula 

P{Zi < Q], < Zj} = Ip(i, n — i + 1) — Ip{j, n — j + 1) 

given, e.g, in Wilks [1] for the continuous case can be obtamed by a slight modi¬ 
fication of (6). 


REFERENCES 

[1] S S, Wilks, “Order statistics,” Math Soc,Bull,Yol 54 (1948), pp, 6-50, 

[2] n ScriEFFit AND J W TtiKEr, “Non-parametnc estimation I Validation of order 

statistics,” Annals of Math Stai, Vol 16 (1945), pp 187-192 

[3] 0. J. Cloppbr AND E S Pearson, “The use of confidence or fiducial limits illustrated 

in the case of the binomial,” Btomeirika, Vol 26 (1934), pp 404-413. 

[4] H, ScHEFP^), “Note on the use of the tables of percentage points of the incomplete beta 

function to calculate small sample confidence intervals for binomialp,”Bioinet- 
nka, Vol 33 (1944), p. 181 

[5] 0. M. Thompson, E S, Pearson, L J Comrie, and H. 0. Hartlbt, “Tables of per¬ 

centage points of the incomplete beta function,” Btometnka, Vol. 32 (1941), pp, 
151-181, 

[6] W," R. Thompson, “On confidence ranges for the median and other expectation distii- 

butions for populations of unknown distribution form," Annals of Math Slat . 
Vol 7 (1936), pp 122-128. 


A LOWER BOUND FOR THE EXPECTED TRAVEL AMONG m RANDOM 

POINTS 

By Eli S. Marks 
Bureau of the Census 

In connection with cost determinations in sampling problems, it is frequently 
necessary to determine the amount of travel among m random sample points in 
an area A lower bound for the expected value of this distance is found to be: 

m — 1 
Vw ’ 




420 


ELI S. MARKS 


wtere A is the measure of the ai’ea from which the m random points are 
drawn/ 

If in a finite area S we locate m points at random (see Figure 1), we can trace 
a continuous path among the m points by starting at some point and connecting 
the points by line segments. The points can be connected in any order so that 
the path touches each point only once (unless it intersects itself at one of the 
random points). We are interested in a lower bound for the expected value of 
the length of the shortest of the m \ possible paths. 



Fra. 1. m Random Points in S. 

We have above an area S in which m random points have been selected (with 
m = 14). 

The shortest path among the m points consists of m — 1 “links" (line segments) 
between two points. Each link can be assigned to one of its end points, leaving 
some pre-designated point (e.g., the m-th point selected) with no link assigned. 
The link assigned to the f-th random point (ico) must be no less than r(,) the 
distance from a;(,) to the nearest of the other (m — 1) points If we denote the 
length of the shortest path by L: 


i > E 


no, 


m—1 

F(L) > E i;(r(o) 

t.l 

Let E*(r(,)) be the expected value of r(,) conditional upon X(,) falling at the 
point X in S andletF(r |x) be the conditional distribution function of r(oforX(,) = 
X. Thus F{r \ x) is the conditional probability of r^^) < r or the probability of 


1 The lower bound obtained is similar in form to the expression for distance traveled 
among a set of random points used by Mabalanobis [2 ] and Jeasen [1 ] 



A LOWER BOUND 


421 


one or more of the (m — 1) random points other than a;^,) falling inside a circle, 
Cf, with radius r and center at x (see Fig. 1). Then, we have: 


Ex in^)) = J idF(rlx), 


where M(S) and M(SC,) are the measures of S and SCV , so that - [. is the 

probability of a random point in S falling into C, . 

Let A = M (S) and construct a circle C ivith center at x and radius p = 

Then M{C) = A = M(S). Let d be the distance from x to the nearest of 
(m — 1) points selected at random from C and let G(r) be the distribution func¬ 
tion of d. Then we have. 



E(d) 

G(r) 



fM(0 - M(ca)\™~' 
I M{C) j 


For r < p, 


M(CrC) = M(Cr) > M(SCr). 


For r > p, 

M(CrC) = M(C) = M(S) > M{SCr). 
Thus, since M{CrC) > M{8Ct), we have for all x m S: 

G(r) > F(r | x), 


and thus, 


Eid) < Ex(r(^)) 


Since E{d) < Ex{r{i-)) for all x in S' 

Eid) < Eir^), 

m —1 

(m — l)E{d) < X) Eirit)) < E{L). 


It only remains to evaluate E{d), the expected distance from the center of a 
circle to the nearest of im — 1) random points This can be done very easily 
by substituting in the expression for (j(r): 


A = ilf(C), 



■wr = M{CrC), when r < p = 



422 

H. M. BACON 

to give: 

G{r) - 1 - 

G'ir) = ^ (m - 1) 1 




B{d) = rG'ir) dr = 

Jo 


where B{rn, 2 ) is the complete Beta function. 
Since Vm [B(m, ^)] > \/ir‘. 


'I »i-i 
/ ’ 

A - irr^Y-' 

A ! > 

i [Bim, i)l, 


Eid) > ~ 



Thus, we have: 

EiL) > ^ "YA 7^‘ 

vm 

It is obvious that the development is general and applies to m random points 
in any bounded two-dimensional Borel set. However, the lower bound ob¬ 
tained will, in general, be useful only when /S is a connected region. 

REFEEENGES 

[1] Raymond J. Jbssen, “Statistical investigation of a sample survey for obtaining farm 

facts,” Iowa Stale College Research Bulletin 304 (1942). 

[2] P. C. Mahalanobis, “A sample survey of the acreage under jute in Bengal,” Sankhyd, 

V 0 I .4 (1940), pp 511-530 


A MATRIX ARISING IN CORRELATION THEORY^ 

By H. M. Bacon 
Stanford University 

1. Introduction. In the study of time series, it is frequently desirable to 
consider correlations between observations made in different years. Let Xa, 
xa, , Xtm be m values of the variable x, , expressed as deviations from their 
arithmetic mean, where a:, is a variable observed in the ith year (i = 1,2, ■ • • ,n). 

' A linear oorrelogram is considered by Cochran m his paper, "Relative accuracy of sys¬ 
tematic and stratified random samples for a certain class of populations,” (Annals of Math. 

Slat., Vol 17 (1946), pp, 161^177) in which p„ = 1 — Setting u = I » — 7 I and L = 1/p, we 

L 

have the case considered above. 



A MATRIX 


423 


Let (Ti be the standard deviation of . If we denote by = r^i the correla¬ 
tion of x-i, with Xj, and if we assume the a;, to be normally distributed, then 


2 = 




(Tl 02 


o-» 


Vi?. 


exp 


■i n n 

-I'Ll. 

1*1 J=sl 


Ri 


X- 


Ro i 


15 the frequency function giving the distribution. Here R is the determinant 
I Tij 1 of the correlation coefficients, and 7?,^ is the cofactor of the element 
in this determinant. 

We may make various assumptions regarding the behavior of the correlation 
coefficients over the n years. One such assumption of some interest is that the 
correlation coefficients diminish in such a way that 


ri, = r,. = 1 - 11 j Ip 

where p is a fixed positive number not greater than 2/(n — 1). Under these 
circumstances, we can evaluate R and i?,, in terms of n and p. 


2. Evaluation of R. We may let i?(p) represent the determinant R of order 

n whose element in the ith row and jth column is r„ = r,, = = 

r„_,,n_{ = 1 ~ 1 f j I p where, for the purpose of evaluation, p is any real 
number. Since each two-rowed minor of i?(p) is divisible by p, i?(p) is divisible 
by p""*. Furthermore, since i?(p) is a polynomial in p of degree at most n, we 
have 

R{p) = Ap" + Bp”"^ = p’’“\Ap -t- B). 

If we set p = 1 and p = — 1, we find A B = R{1) and i2(—1) = (—1)"~^ 
(—A -t- B) so that —A -f B = (—l)”~^i?(—1). By elementary methods we 
find that i?(l) = 2”'^^(3 — n) andi?(—1) = (—1)" ^2" ^(n 4- 1). Hence 

A -t- B = 2”~'(3 - n) 

and 

-A + B = + 1). 

Solving for A and B we find that 

R = R(p) = 2"“ p"''[2 - - Dp]. 

3. Evaluation of i?„ . Similar methods yield the following values for the 
cofactors R,j of the elements of R: 

i?n = R.n = [2 - (n - 2)p], 

R 22 = Rs 2 -- Rn-i.n-1 = 2'^-y-' [2~(n- Dp], 

= R„i = 2"-»p"-\ 

= -2^-y-^l2 - (n - Dp], 

otherwise, 


R^j = 0 . 



424 


vr J. DIXON 


4. The frequency function. The quadratic form appearing in the exponent 
in the expression for the frequency function can now be written as 


lEDol Jpal 


Cbi Ojj 

Rfft a. 


2 — (ji — 2)p 

<11 f'/ 


2p[2 - (n - r)p] 
1 
V 


+ + 


2 

«-2 


-f 


+ 


1 


2[2 - {n - l)p] Vi 
1 

1 — -h 

02 Ox 


fxiXn X, 
<rn 


n^A 

II XlJ 


XtXi . X2 Xx , XjXs 

2pV<ll <12 


_ XjXi 

0*2 era era 0*2 


+ •■■ + 


•'Sn A 

Cji (Tn— 1 / 

_ if 2 — (n — 2)p /x'x I a;^\ . xl x^ a;,+i1 

p L2[2- (n - l)p]\ol olJ'^tio^, f=i o. oi+xj 


+ 


_ 1 / xi. 

2 — (n — l)p \(ri <r„ 




S. Maximum likelihood. The expression s is the likelihood of getting a 
particular set of values of the variables a:i, 2 : 2 , • ■ • , a;„ . It is often important 
to regard the r.j and the o-, as parameters and to determine them so that the like¬ 
lihood will be a maximum. If we assume vi = 0-2 = • • ■ — On - o^ then 

__ / 1 ^ Bn] Xt Xj 

^ /rt ^fl/2 n , / ■J^ exp ^ ^ 75 _5 




.1 


Ro^ 


The question, in our case, now becomes, What values of p and o will make z 

a maximum for given Xt ? Necessary conditions are that -^ = 0 and ^ = 0. 

bp _ bo 

Since Rt] and B arc given in terms of p, the process of differentiation can be carried 
out (first take the logarithm of z), and values of p and o necessary for a maximum 
determined It is, of course, possible that z has no maximum, and the sufficiency 
of these values must be tested. The computations for the general case are 
laborious, though straightforward. Furthermore, because of the complicated 
nature of the coefficients in the equation to be solved for p, the general solution 
is not readily obtainable. This equation is, however, of third degree, and it can 
be solved in any particular case. 


TABLE OF NORMAL PROBABILITIES FOR INTERVALS OF VARIOUS 
LENGTHS AND LOCATIONS 

By W J. Dixon 

Umversity of Oregon 

1. Introduction. The probability associated with a particular finite range of 
values is often desired. The usual tables of normal areas gives values for f or 



TABLE OF NOBMAL PEOBABILITIES 


425 


as in the table by Salvosa [1], J The WPA table [2] gives J . The author 


px-l-j I 

has deposited with Brown University a table of / for values of a;[0( 1) 5 0] 

Jx-jl 

and values of i]0( 1) 10.0], The values in the table may be interpreted as the 
probability that an observation from a normal population with unit variance 
will fall in an interval of length I whose midpoint is a distance x from the mean. 
These values can be obtained by a simple computation from the existing tables. 
Since values were being used frequently, the present table was constructed. 
Microfilm or photostat copies may be obtained upon request to the Brown 
University Library 


2. Computation. The values were obtained by finding the difference between 


the integrals j 


and 


f ^~r 

•</3 


as given to six decimal places in Salvosa’s table. 


Being differences, the values are subject to an error of 1 unit in the sixth place. 
For values of x + greater than 5, the values can be obtained by computing 


1 


px-il 

J—00 


The search for errors was aided by computing column sums, i e 


( 1 ) 


M rx,+it 1 

/ + O / , “ -5 


where i represents the row number and n represents the column number. For 
example, n = 17 corresponds to column for I = 1.7 The approximation becomes 
poorer as n increases but the sums were still useful for checking purposes. 


3. Example. The table has been used in studies of the expected pioportion 
of a line covered by intervals dropped on it according to some normal probability 
function. Let Pn(x) be the probability that the point x is covered at least once 
when n intervals are dropped on the n:-axis H E Robbins [3] gives the ex¬ 
pression ; 

(2) F(F) = ^ j[ -P„(.x) dx, 

for the expected proportion of a line of length L covered at least once by these 
intervals 

Let f(x) dx be the probability that an interval falls with its center in dx and I 
be the length of the interval The probability that a point .r will be covered by 
one interval dropped on the T-axis is; 

fX+il 

(3) g{x) = fit) di 

Jx—\l 

When n intervals are dropped, the probability that x is covered at least once is ■ 

(4) Pn (x) = 1 - (1 - g(x))”. 



426 


G. E. ALBERT 


and 

(5) S(F) = 1 jf(l -i/(.r))’'dr. 


When k groups of )i, intervals are dropped according to, say normal distributions 
■with different means, 

( 0 ) p„(a-) = 1 - n (1 - 9 My‘ 

Where 


(7) 


/•i+il 

= / Mi) di 

Jx-\l 


and "^ve obtain 

(8) E{F) = 1 - y f' n (1 - P.(a;))"’ dx. 

Ju Jq tal 


The values gix) are those given m the talrle and are useful in evaluating the 
integrals in (5) and (8) by numerical methods. 


REFERENCES 

[1] Luie R, Salvosa, "Tables of Pearson’s Typo III functions,’’ Annals of Math Stal, Vol. 

1 (1930), p 101. 

[2] National Bubeau of Sta.ndabds, Tables of Probabilily Functions, Vol. 2 (1942), 

[3] H. E Robbins, "On tho raoasure of a random set,” Annals of Math. Slat , Vol. 16, (1944) 


CORRECTION TO “A NOTE ON THE FUNDAMENTAL IDENTITY OF 
SEQUENTIAL ANALYSIS” 

By G E. Albert 
University of Tennesse 

In the paper cited in the title {Annals of Math. Stat., Vol. 18 (1947), pp. 593- 
596), the proof of Lemma 3 is incorrect. The following correct proof is due to 
Mr. C. R. Blyth of the Institute of Statistics, University of North Carolina, 
It is easy to establish the equation 

Pin = N\F)[<piie)T'' = Pin = iV'l(?)En.j.[exp(-foZiv)|G], 

where B„_Ar(Ml(T) denotes the conditional expectation of u under the condition 
that n = N iox any fixed integer N. By Wald [2], equations (2.4) and (2.6), 
there exists a finite constant C independent of N 'which dominates the expected 
values E,.„jv[exp(— kZN)\G] for every N Thus 

(A) Pin = VlE)[^(fo)r" ^ C-Pin = N\G). 



CHA.RL1EB TYPE B SERIES 


427 


By Stem’s theorem [3], there is a positive raimber ti such that E(exp nk\G) is 
finite But by (A), 

£{exp n[h — log ^(4)11 ^ C'-£;(exp nti\G), 
and Lemma 3 is proved. 


CORRECTION TO "ON THE CHARLIER TYPE B SERIES” 

By S. Kullback 
George Washington University 

In the paper cited m the title (Annals of Math. Stat, Vol 18 (1947), p. 575), 
the phrase "so that. . . i?i > 1” on lines 5 and 6 should be deleted I am grate¬ 
ful to Prof. Ralph P Boas, Jr for calling this to my attention. 



ABSTRACTS OF PAPERS 


r’lesontod June 22-24, 194S at the Berkeley Meeting of the Institute 

1. Estimation of Parameters for Truncated Multinormal Distributions. Z W 
Bienbaum, E. Paulson and F. C. Andrews, University of Washington, 

Let Ahw) = (A'l , , Xj,) be an AUdiinensional random vaiiablc with a 

non-singular normal distribution, and let the expectations, vananees and covariances of 
A'p+i , ■ ‘ Xjf be known. A large sample of is available, obtained undei some sido- 
Gondition on (X„+i, ■ , X /^), this side-condition may be a truncation of any kind oi, more 

generally, a selection, i c. imposing on X^+i , • , AV a probability-distiibulion different 
iroin the original marginal distribution A method is developed for estimating, from such a 
large sample with a side condition, all the missing paiametcrs of the original distribution of 
A''(,v) , that IS the expectations, vananees and covariances of X, , , Xp , and the co- 

vaiianecs o-X,Ahfor j = 1, , pand L = p +1, ■ • ■ ,N This method does not require the 

knowledge of the side-condition (This paper was prepared under the sponsorship of the 
Oflicc of Naval llcscareh ) 

2. A Test of the Hypothesis that a Sample of Three Came from the Same Normal 
Distribution. Carl A. Bennett, General Electric Company, Hanford 
Works, Richland, Washington. 

In the control of the proolsion of chemical analyses performed in duplicate, a tost some¬ 
times boeomes necessary as to whether three determinations can reasonably bo assumed to 
have aiison fiom the same normal population. A critical logion for testing this hypothesis 
IS given by Zf > Ro, whore R = D/d, D being the maximum and d the minimum differonoe 
between the thicc values, and Zio is deteimined by integration over the upper tail of the 
Cauchy distribution. It can easily lie seen that this test is ec]iiivalcnt to a Z4ost botwocii a 
sample of one and a sample of tn o 

3. A Note on the Application of the Abbreviated Doolittle Solution to Non- 
Orthogonal Analysis of Variance and Covariance. Carl A. Bennett, 
General Electric Company, Hanford Works, Richland, Washington 

S S Wilks has shown that the sums of squares necessary to the tests commonly made m 
noii-oi'thogonal analyses of variance or covariance can in general be reduced to the ratio of 
two determinants If several determinantal operations are pei formed to remove the 
singular principal minors, the abbreviated Doolittle solution yields these sums of squares 
directly A combination of this technique and the calculational methods advanced by 
Wald and Yates greatly reduces the tedium of calculation in this type of analysis 

4. Yield Trials with Backcrossed Derived Lines of Wheat. G A Baker and 
F. N. Briggs, University of California, Davis. 

Strains of White Federation 38 and Baart 38 Wheats derived by backcrossing sufficient 
to insure a high degree of homogeneity for all genetic factors were grown in conventional 
yield trials The lesults were somewhat contradictory and led to a critical examination of 
such trials The assumption that the deviations of yields in field trials from the specified 
pattern are random with uniform variance and expectation nero is not sufficiently realistic 
We are led to consider a mathematical model which assumes a set of fertility levels upon 

428 



abstbacts of papers 


429 


which a random element is snpeumposed On the basis of this model it is possible to 
account foi the low observed correlations between residuals and plot yields. In such a 
model the vaiianoe ratio F may bo approximately unbiased but then its variance is smaller 
than under conventional assumptions On the other hand, the expected value of F may be 
greater than one and sufficiently large so that “significant difforeneos’’ between strains will 
always be found due to the difierences in fertility levels Iii such cases the results of the 
cxpcriincnt may be misinterpreted Transformations, in the ordinary sense of the word, 
will not bring such data into conformity with the conventional model In order to bring the 
correlation between residuals and plot yields down to a sufficiently low level it is neceasaiy 
to concentrate moat of the vaiiation in fertility levels into a few plots. That this is not 
uiiicasonable is borne out by agionomic obscivationa. This model also explains the 
absence of coirelation between the yields of strains as determined in two different trials on 
the same set of strains 


5. The Selection of the Largest of a Numher of Means. Charles M. Stein, 
Univeisity of California, Berkeley. 


Suppose Xti , i = 1, • , p; f = 1, 2, • • aie independently normally distributed with 

means + rj, and variances a] where , t), are unknown but it] aie known, t, a me fixed 
numbers with 0 < «, 0 < .x < 1. It is desiied to select, by a sequential procedure, in which 
we take fust the observations with second subsciipt 1, etc. an integer M among 1, ' , P 

such that, for every />, = 1, ■ , p and , • 5i., i)i, ijz , satisfying S fj + e for all 

j A-, P (M = fc) g 1 - a In accordance with the following rule, one decides at each stage 
(after the observations with second subscript n) to take no more observations with certain 
first subscripts For each n = 1, 2, • and each f = 1, , p compute 


z 


1 _ 

2 


Jaal O' J 



- 




wheio 2 1 IS the aveiago of the obseivations with second subscript j and t, is the numbei of 
such observations Continue taking obseivations for those I for winch this 

expression is greater than {lna]/e but not for the otheis Eventually there will be at moat 
one subscript Z = 1, • ■ ■ , p for which one continues to take observations and if there is one 
this is chosen to be Jlf If there is none, the Zforw'hich the suraislarge.st is chosen to be M 
This piooeduie IS a straight-forward application of the Lemma onp 146 of Wald’s Sequential 
Analysis, and generalizations can easily be found. 


6. The Effect of Inbreeding on Height at Withers in a Herd of Jersey Cattle. 

W C. Hollins, S. W Me.i.d, and W. M Hegan, University of California, 
Davis. 

The data consist of measurements of height at withers of about 200 females for various 
ages from one month to five years The intensity of inbreeding as measured by Wright’s 
coefficient of inbi ceding averaged 15 per cent and reached as high as 44 per cent in a few 
oases 

An intra-sire covariance analysis of height and pei cent of inbreeding was made for 
various ages from the fast month to the fifty-fourth month 

The results of the statistical analysis indicate that the inbiecl animals are shorter at one 
month of age and grow more slowly up to about the sixth month than do the ontciossed 
animals, but that from the sixth month on the inbreda begin to catch up wuth the outoi ossed 
30 that at maturity there is no significant difference in height 



430 


ABSTEACra OF PAPEES 


7. An Example of a Singular Continuous Distribution. Henry Schefpe, 
Univeisity of California at Los Angeles. 

Simple and "natural" examples of singular continuous probability distributions are of 
pedagogical interest They are trivially available in the ic-variate ease for k > 1. A 
univariate example may be obtained from the notion of a sequonoe of independent trials of 
an event with constant probability p of success, a notion familiar to the student and indis¬ 
pensable in elementary probability theory. The (real-valued) random vauable X is taken 
to be the dyadic representation of the sequence of results (1 and 0, respectively, for sucoess 
and failure) It is known that X has n singular continuous distribution for p 9^ 0, h,l 
This result may bo proved by using only the Tcliebycheff inequality together with the 
formulas for the mean and variance of the binomial distribution; 

8. On the Theory of Some Non-Parametiic Hypotheses. Erich L. Lehmann 
and Charles Stein, University of California, Berkeley, California. 

Eor two types of non-parametric hypotheses optimum teats are derived against certain 
classes of alternatives The two kinds of hypotheses aie related and may be illusti ated by 
the following example (1) The joint distribution of the variables Zj , ■ • ■ , , ■ , 

y„is invariant under all permutations of the variables; (2) the variables aic independently 
and identioally distributed. It is shown that the theory of optimum tests for hypotheses 
of the first kind is the same as that of optimum similai tests for hypotheses of the second 
kind. Most powerful teats are obtained against arbitrary simple alternatives, and in a 
number of impoitant oases moat stringent tests are derived against certain composite 
alternatives For the example (X), if the distributions aie restiicted to probability densi¬ 
ties, Pitman's test based on i/ — S is most powerful against the alternatives that the Z’s and 
y’s are independently normally distributed with common variance, and that F(Z,) = f, 
E{Yt) = 7) where v > If >/ — £ may bo positive or negative the test based on | j/ - S | 
is most stringent The definitions are sufficiently general that the theory applies to both 
continuous and discrete problems, and that tied observations present no difficulties. It is 
shown that continuous and discrete problems may bo combined Pitman’s test for example, 
when applied to certain discrete problems, coincides with Fisher’s exact test, and when 
m = n the test based on 1)/ — x [ is moat stringent for hypothesis (1) against a broad class of 
alternatives which includes both discrete and absolutely continuous distributions 

9. Concerning Compound Randomizationinthe Binary System. John E. Walsh, 
Project Rand, Santa Monica, California 

Consider a set of binary digits. The numerical deviation from ^ of the conditional 
probability that a specified digit equals 0 is called the bias of that digit foi the given condi¬ 
tions on the remaining digits of the set. The maximum bias of the sot is defined to be the 
maximum of the biases of the digits of the set. A set of binary digits is called random if its 
maximum bias is zero Now considci an array of (1 -1- fi) ■ ■ ■ (1 + ix) X n binary digits 
such that the rows are statistically independent A compounding method of obtaining a 
sot of h • ■ iicn binary digits from the original array is presented. By suitable choices of 
K,ti, , Ik the maximum bias of the compounded set can be made extremely small oven 

if the maximum bias of the original array is not small; this can be done so that h • ■ fx/ 
(1 -f (i) • • (1 -t- Ik) is moderately large. Also a method is outlined for constructing an 
approximately random binary digit table This table has the pioperty that the maximum 
bias of a set of digits taken from the table is an increasing function of the number of digits 
in the set. 



ABSTRACTS OP PAPERS 


431 


10. A Multiple Decision Protlem Arising in the Analysis of Variance Edward 
Paulson, University of Washington, Seattle. 

In some applications of the analysis of variance, a proceduie is required for classifying 
vaiieties into' superioi ’ and'inferior’ groups. Consider K varieties, with x^a the a"' observa¬ 
tion on the t"' variety (a = 1, 2, r, i = 1, 2, K), let i, = S x,„/r and let be an 

independent estimate of the vaiianoe For the variety form the con esponding interval 
/ Xs , Xs \ 

lx, — -h superior group then consists of the variety with greatest 

sample mean, together with those varieties whose corresponding intervals have at least one 
point in common with the interval for the variety with the greatest mean If all varieties 
fall into one group, this group is labeled 'neutial' and the varieties are considered homoge¬ 
neous, To select X, consider the relative importance of different incorrect classifications 
For a given X, an explicit expression is found for P(A), the probability the varieties will not 
all be classified in one group when to, = = = mi w'here m. = F(r.), also explicit ex¬ 

pressions are foundtorP(Si) and PiBi), where P(Pi) is the probability there will not be a 
superior group consisting only of the Kth variety and P{Bi) is the probability there will 
not bo a superior group consisting of at least the Kth variety, when Wi = ma = ■ ■ = nu-i = 
m and w/. = m + A(A > 0) Similai results are obtained for classifying K processes 
according to their variances 

11. Recurrence Formulae for the Moments and Semi-variants of the Joint 
Distribution of the Sample Mean and Variance. Olav Reiers0l, University 
of Oslo, Norway. 

Let Xi , Xi , • , Xn be indepondont and having the same distribution We consider the 

arithmetic mean m and the variance v = (l/(« — l)) S (*, — m)h Let k,, denote the semin- 
variants of the joint distiibution of m and v, and let the seminvanant generating operators 
K be defined by the equations' Kr+i,s = KiKn , xr-.+i = KiKn K, 1 = 0, Jf,(PQ) = PiK^Q) + 
Q{KtP) An operator which operates only on the fiist factor of a product shall be 
denoted by a prime, and an operator which operates only on the second factor shall be 
denoted by a double prime Wo have the following general formula, valid for any parent 
distribution K[\{n — l){Ki -h kqi 4- koi) ~ 2n{K[ -f -f k{J)1'[1 (koi — Wkjo) -|- 

n(iiiomo - I'Kio)] = 0. For s = 0, 1, 2, we obtain the formulae, - nma) = 0, 

K[[{n — 1) (ko 2 — WK 2 i) — 2nV2o] = 0, K[[(ji — — Wn) — 8n'‘{n — 1)k:i/(2o + 4n’(?i — 1) 

Kao - - l)x2o] = 0. 

12 The Problem of Identification in Factor Analysis. Olav Reiees0l, Uni¬ 
versity of Oslo, Norway. 

The paper is concerned with the multiple factor analysis of L, L. Thurstone Thurstone 
has given criteria which he says arc almost certain to constitute sufficient and more than 
necessary conditions for uniqueness (i e idcntifiability) of a simple structure It is shown 
that Thurstone’s criteria are not always sufficient, and conditions are derived which are 
more neaily necessary and sufficient for the identifiability of a simple structure, Let A 
be the matrix of factor loadings with n rows and 1 columns. When the communalities aie 
identifiable, the conditions will be. (1) Each column of A should have at least r zeros. 
(2) Let us consider the submatrix B oi A, consisting of all the rows which have zeros in the 
kih column Then, for q = 1,2, ■ ,r — 1, there should for any combination of q columns 

different from the kth, exist at least g -|- 1 rows of B containing non-zero elements in the q 
columns This should be true for any value of k 



432 


ABSTRACTS OP PAPERS 


13 Note on Distinct Hypotheses. Agnes Berger, Columbia University, New 
York. 

Aa was pointncl out by Npymaii, one of the difficultira which one may encounter when 
devising a teat to dietinguiah between two exhaustive and cxehisivo composite hypotheses 
letoriing to the unknown distribution of a. random vector X is the. following' If IIq states 
that the tiue distribution funclion of X belongs to a set ll'l and Hi that it belongs to a 
set 10], it may happen that to every llorel set W of the Biimple siiace thoi e exists an element 
Fiy in (f!') and an olcment Giy in |<?1 for wliieh the pioliability of tho samplo point x falling 
on W IS tho same and therefore indepoivdont of whether Hn or /Ii is true. If this is the case 
tho pair II« , Hi is called non-distinct, otherwise (hey are called distinct. The existence of 
non-distinot hypotheses is demonstrated by a simple example, Ho consisting of one, Hi 
of throe suitably chosen stepfuncLlons It is shown however that if the sets IF) and (G) 
contain only continuous distribution functions and are at most enumerable then the pair 
Hi , Hi is distinct. Necessary and sufRcient conditions for Ha and Hi to bo distinct were 
obtained jointly with Wald for an impoitanl elass of hypotlieses each containing a con¬ 
tinuum of nlteinatives, 

14. Place of Statistical Sampling in the Education of Engineers. E. L Grant, 
Stanford University. 

There is convincing evidence that many engineering problems could bo solved better 
with tho aid of statiatioal methods than they arc now solved without this aid However, 
few practising engineers oi teaohora of engineering have had any training in statistical 
methods. Aa a result, those engineering problems which arc in part statistical problems 
are seldom recognused aa such, Even in tho field of industrial quality control, in which 
successful applications of some of the simplci statistionl techniques have been made in 
many different industries, the surface has barely been scratched and a sciious obstacle to 
progress is the lack of a widespread appreciation of the statistics point of view among 
design engineers, production ongiiiceis, inspection personnel, and management. 

This condition might gradually bo oorreoted if duiing tho next few yoais instruction m 
statistics should bo introduced into all undergraduate engineering curricula. However, 
BDitifi recent discussions touching on tho .subject of slati.slic.5 instruction for engineering 
students (e g , the report on "The Teaching of Statistics” which apjicared in the March 
1948 issue of the Annals of Math Slat ) have boon most unrealistic icgauhng tho amount of 
statistics instruction which could be added to engineering curricula Those discussions 
have suggested a full yeai of basic statistics followed by one or more courses in ongineeung 
appiicat.ions Desirable as this arrangement might bo from the point of view of the most 
effective instruction in statistics, it is out of the question when consideicd in the light of 
tho many subjects which are needed m engineering curricula Although undergraduate 
engineering ourrioula have always been tighter than other curricula, the pressures today 
are greater than ever before—for moio time devoted to tho humanistic-social stem, foi 
more time in basic mathematics and science, for introductory oouises in vaiious eoonomio 
and management subjects such as engineering economy, accounting, industrial relations, 
business law, and industrial organization and management, and for more timo in tho various 
departmental oourses m engineering subjects in order to permit presentation of important 
recent developments m engmoeiing technology Under these oiroumstancea the most that 
can bo hoped for in i.he undergraduate program is a single statistics course for one term, 
possibly three units for one semester or four umts for one quarter. This should be supple¬ 
mented by additional statistics instruction for some graduate students in engineering 
A few engineering graduates should be encouraged to take graduate degrees in statistics 
and to make careers in the field of applied statistics. 



ABSTRACTS OR PAPERS 


433 


Ill a succcasfiil undergraduato statistics course foi engmcoi'iiig students, the problems 
and illustrations should be selected with two puiposes in mind One purpose, of course, 
should be to develop the principles of probability and statistical method The other, 
equally important if these engineering graduates are to persuade their colleagues and 
superiors to adopt the statistics point of view in approaching engineering pioblcms, should 
be to demonstrate how statistical method provides a useful guide to action in many different 
engineering situations Applications of statistics to industiial quality control provide 
particularly good problems and illustrative examples which seive this second puipose 

15 Statistical Problems of Medical Diagnosis. Jerzy Nbtman, TJniveraity of 
California, Berkeley 

“Diagnosis" is used to dcsoiibo the outcome of a stiictly defined test T, such as Wasser- 
mann test, which may lead to either of two possible outcomes, “positive” or “negative” 
Cases contemplated are such that at the tune the test T is porfoimed it is impossible to 
verify its verdict for certain and the best one can do is to repeat the test It is postulated 
that to each individual of a population there corresponds a probability p that the test T 
will give a positive outcome The value of p may vaiy from one individual to another 
It is presumed that as p increases, the illness m the patient increases. Problem of oompaii' 
son of two alternative, tests and problem of estimating the distribution of p reduces to 
pi oblcms relating to the diati ibution of X = number of positive outcomes m n independent 
diagnoses Statistical machmciy suggested is that of BAN c.stimate,9 {Public Health 
Report, Vol. 62, (1047), p 1449) Principal result reported is that, with the mathematical 
model used in the paper quoted, the empirical variances of foui BAN estimates computed 
for 205 samples of 1000 elements each agreed leasonably with the theoictioal asymptotic 
values Empiucal distributions of three of these estimates did not show deviations from 
normality That of the fourth was non-normal. It seems therefore that the asymptotic 
procedure of BAN estimate may be adequate for similai analyses, 

16. Power of Certain Tests Relating to Medical Diagnosis. C L. Chiang and 
J. L. Hodges, Jr., University of California, Berkeley. 

Associate with each individual in a population v the piobability p that he will be found 
tubercular when examined by a standard X-i ay technique. Yerushalmy and others [/ Am 
Med Assn., Vol 133, (1947), p. 359] performed 5 independent such diagnoses on each of 1266 
persons Neyman [Public Health Reports, Vol 62, (1947), p 1449] proposed a simple foui- 
parameter model for the distribution of p in tt, estimated the parameters from the data of 
Verushalmy and others, and obtained a satisfactory fit In the present paper, the work of 
Neyman is paralleled with four new models, all giving satisfactoiy fit with the same data 
The five models differ considerably in shape, and in the number of i epeated diagnoses which 
they indicate to be necessary to detect a high proportion of those individuals having, say, 
p A 0 1 Therefore further preliminary study seems indicated before one can design a mass 
survey to detect a high proportion of such persons The approximate power of the ffist 
of the Neyman model is considered, using one of the other models as alternative. It is 
found that to obtain power 0.7 with level of significance 0 06, it would be necessary to diag¬ 
nose 6290 individuals 5 times each. 

17. Iterative Treatment of Continuous Birth Processes. T. E. Harris, 
Project Rand, Santa Monica, California 

Random vaiiablcs «n aie defined by go = 1, P{^i = r) = pr , r = 1, 2, • • ; if g,, = fc, 
gii+i is the sum of k independent vaiiates, each distributed like Zi Let a: = 2 rpr < “ I 



M s 


434 


ABaTRACTS OF PAPERS 


T^Pr < « ; 0 < j)i < 1. The genoiating function/(s) = S prS' is said to be C.I if iheie 

exists a family of generating functions/(s, t) witli/(s, 1) = /(s),/[/(s, = /(s, tf) for all 

nonnegative i and t' A necessary and sufficient condition that /(s) be C I la that the 
numbers Or, r = 2, 3, • • , be nonnegativo, the Or are determined recursively by requmng 

to 

that the power senes f(s) =—8 + 2 arS' satisfy formally the functional equation £(s)/'(s) = 


{[/(s)] The problem is connooted with classical works on iteration If f{s) is C.L, the 
given Markoff process can bo imbedded in a continuous birth process. If {(s) is given, the 
m.g f, c)!>(s) of the asymptotic distribution of the variate Hh/x" may be determined from the 


(r e'(i) 1 n ^ 

- 1) exp { / + :- dy >. Various properties 

f - 2/J J 


formula = (« 

responding distribution can be inferred from this expression 


of the I 


18. Estimation of Means on the Basis of Preliminary Tests of Significance. 
Blair M. Bennett, University of California, Berkeley. 


This paper examines the statistical procedure of pooling two sample means on the basis 
of the results of one or moie preliminary tests of significance. Let x., (i = i, , Ni), 

1 epresent a sample of Ni observations from a normal population 7)i(f, o-,), and y, a sample of 
Ni obsorvations from An estimate of { which is commonly used in certain practical 

situations is given by x' = S, or x' = {NiX + MsJI/OVi + Ni), aoooiding as the sample 
means x, y do oi do not differ significantly on the basis of a preliminary test The distribu¬ 
tion of the estimate x' is dotormined, according as o-i = Oi arc known oi unknown In both 
situations, the maximum (or minimum) bias is computed as a function of various levels of 
significance of tho proliminary tost of equality of moans Also, the mean square error of 
the estimate x' is oaloulated in both cases. If now’ equality of variances cannot be assumed, 
but an fi'-tost of the sample variances sj , Sj does not indicate any significant difference, 
then in praetioo S, 'll may be pooled, tho weights being inversely proportional to the sample 
variances. Thus, the usual estimate of ^ will be of the form' x' = 2 , or x' ~ (iViS/Si + 
Niy/8\)/{Ni/S\ + Ni/S\), according as X and y do or do not differ significantly on the 
basis of the Student t-tost, subsequent to an F-test. Tho bias and mean square erroi of this 
estimate have been computed w'lth the aid of the conditional power function of the t-test 
subsequent to an F-test. 

19. Note on Power of the F Test. Stanley W. Nash, University of Califomiat 
Berkeley. 


Assuming “treatment” expootations to be normal random variables, the ratio of tho 
sum of squares due to tieatments to the sum of squares due to error has a central F dis¬ 
tribution in the cases of randomized blocks, Latin squares, and one-way classifications. 
The F statistic converges in probability to a constant as the number of treatments is in¬ 
creased. This is one plus a multiple of tho variance between treatment expectations. 
The power of the F test inoioases monolonely to one as the number of treatments is in- 
oreased. This power can be calculated using tables of the incomplete beta function. 


20. Best Asymptotically Normal Estimates. E W. Barankin and J. 
Ghrland, University of California, Berkeley. 

The methods of minimum developed by Neyman for obtaining BAN (best asymp¬ 
totically normal) estimates of the parameters appealing in the multinomial distribution 



ABSTEACTS OE BAPEBS 


435 


are generalized to obtain certain optimum types of estimates in the case of an arbitrary 
distribution under certain restrictions Let the random vector X have the probability 
density!; (a:, 9) in the absolutely continuous case and leti;(a:, fl) = P[X = x/B] in the discrete 
case, where is a fixed vector in the parameter space. Functions <I>,(X), (i = 1, 2, ■ , r) 

are selected for the purpose of foiming estimates, these estimates are taken to be functions 
of the sample moments — i Certain quadratic forms which depend on the 

choice of functions <f>i(X), ^iiX), • , <l>t(.X) aremimmized with respect to the parameters. 

In this manner, asymptotically normal estimates aie obtained which are consistent, and 
have minimum asymptotic variances within the class of estimates so determined by the 
particular functions <^ 1 , il>2, • , <t>r • It is possible, thiough a modification of this prooe- 

duie, to obtain estimates by solving a set of linear equations, If v(x, 6) has the form 

t 

v(x, 6) = exp (^o(^) + + Vo(.x)\ 

*=1 

it can be shown that the best choice of the <^’s is yi(x), t/aCx), , ys{x). Maximum likeli¬ 
hood estimates belong to this class of BAN estimates. 



BOOK REVIEW 


The Theory of Games and Economic Behavior John von Neumann and Oskar 
Morgensiern, Princeton University Press, 1947, Second Edition, Pp xviii, 
641 SIO.OO 


Reviewed dy Leonid Hubwicz^ 

Iowa Stale College 

This review is devoted to the second edition of a book which from its first 
appearance was acknowledged to be a major contribution in the field of theory 
of rational behavior. As is pointed out in the Preface, “the second edition 
differs from the first in some minor respects only”. The main change is the 
addition of a proof (of “measurability” of utility) omitted in the first edition. 

The book’s objective is to solve the problem of rational behavior in a very 
general type of situation. 

It is, therefore, not surprising that its results are of relevance in many fields 
of knowledge, among them economics and statistical inference. 

In both economics and statistics the problem of rational behavior is a funda¬ 
mental one. Thus one of the classical problems treated by the economic theory 
is that of profit maximization by a firm. The firm is assumed to be maximizing 
its net profit which is a function of prices of the product, materials used, etc., as 
well as the quantities used and produced. In the simplest case prices arc taken 
as given, more generally they are assumed to be functions (known to the firm) 
of the quantities sold and purchased. But assuming this function to be known 
presupposes the loiowledge of behavior of other firms, This procedure has for 
a long time been regarded as highly unsatisfactory, it is analogous to elaborating 
the theory of rational behavior of a poker player on the assumption that he knows 
the strategy of the other players i 

It IS the type of situation in which not only the behavior of various individ¬ 
uals, but even their strategies, are interdependent, that is treated by von Neu¬ 
mann and Morgenstern. The essence of their solutions is to base the optimal 
stiategy on the mimmax principle. As applied to a game, the principle re¬ 
quires that one should choose a strategy which minimizes the maximum loss 
that could be inflicted by the opponent. 

The mimmax principle, ivhen applied by both players need not, in general, 
lead to a stable solution. To ensure the existence of such a solution the authors 
are led to the postulate that the choice of strategies be made through a random 
process The mimmax to be found is that of the mathemaUcal expectation of 
the loss in the game. The latter postulate is of a restrictive nature^ since it 
implies that the game is played for numerical (“measurable”) stakes and that 

■■ On leave to the 110110(1 Nations Economic Commission for Europe. 

“ See Jacob C Marsohak, "Neuinann’s and Morgenstern’a New Approach to Static 
Economics", The Journal of Political Economy, Vol LIV (1946). 

436 



BOOK RETIEW 


437 


the second and higher moments of the probability distribution of the losses are 
immaterial. This restriction, however, has peimitted the authors to go deeper 
in other directions. Given the great complexity of the problem, even in its 
restricted version, the authors’ decision can hardly be criticized. One could 
only wish that similar considerations had made the authors more tolerant towards 
other work in the field of economics than is shown in some sections of the book. 

The readers of the Amah will be particularly interested in the connection 
between the Theory of Games and the theoiy of statistical inference 

As has been pointed out by Abraham Wald* the problem faced by the statisti¬ 
cian IS somewhat similar to that of a player in a game of strategy. The theory 
of statistical inference may be viewed as a theory of rational behavior of the 
statistician. His “strategy” consists in adopting an optimal test or estimate, 
more generally an optimal decision function. This optimal decision function 
must be chosen without the knowledge of the “a prion” distribution of the pop¬ 
ulation parameters. Wald’s basic postulate of minimization of maximum risk 
is equivalent to regarding the statistician as a player in a game of strategy, with 
“Nature” as the other player. The optimal decision function is chosen in a 
way which (as shown by Wald) is equivalent to assuming the “least favorable” 
a prion distribution of the parameters As Wald says, “we cannot say that 
Nature wants to maximize [the statistician’s risk]. However, if the statistician 
IS completely ignorant as to Nature’s choice, it is perhaps not unreasonable to 
base the theoiy of a proper choice of [the decision function] on the assumption 
that Nature wants to maximize (the statistician’s risk)”. 

It may be noted, however, that statistical inference, as seen by Wald, is a 
relatively simple game since it involves only two players and is of the zero-sum 
variety. 

The admiring and enthusiastic reception given to the book’s first edition would 
make any further general appraisal somewhat anticlimatic Suffice it to say 
that a good deal of valuable work has already been stimulated by the Theory of 
Games, both in the field of social sciences and in mathematics. 

“Abraham Wald, “Statistical Decision Functions which Minimize the Maximum Risk”, 
Annals of Mathmatm, Vol 46, (1945). 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Paul H, Anderson, formerly an Economist with the War Assets Adminis¬ 
tration, Washington, D. C., has been appointed Professor of Marketing at 
Loyola University, New Orleans, Louisiana. 

Mr. N. H Carrier has resigned his position Avith the Mathematical Statistics 
Section, Chief Scientific Advisers Division, Ministry of Works, England to ac¬ 
cept an appointment as Statistician in the General Register Office, Somerset 
House, Strand, London, W. C. 2, England. 

Dr. T. Freeman Cope has been promoted to a full professorship at Queens 
College, Flushing, New York 

Dr. Wayne W. Gutzman, who was formerly at the Postgraduate School, 
Naval Academy, Annapolis as an Assistant Professor, has accepted a professor¬ 
ship in the Department of Mathematics, University of South Dakota. 

Mr, Elvin A Hoy has transferred from the position as Chief, Statistics Sec¬ 
tion, Bureau of Research and Statistics in the Social Security Administration to 
the position as Chief, Research Evaluation Section, Naval Reserve Training 
Publications, Navy Department, Naval Gun Factory, Washington, D. C, 

Dr Joe J. Livers has been promoted to a full professorship at Montana State 
College, Bozeman, Montana, 

Professor Ernest S Keeping has returned to his position at the University of 
Alberta, Edmonton, Alberta, Canada after having spent the spring term of 1948 
at the Institute of North Carolina 

Mr. Wharton F. Keppler of the M&R Dietetic Laboratories, Inc , Columbus, 
Ohio has recently qualified as a Professional Industrial Engineer in the State of 
Ohio. 

Mr. Ralph Mansfield has formed his own company to manufacture electrical 
testing equipment. The company is knoivn as the Auto-Test, Incorporated, 
with Mr Mansfield acting as Vice-president and Chief Engineer. 

Mr Jack Moshman has resigned an instructorship in mathematics at the 
University of Tennessee to accept the post of Statistician to the Medical Advisor, 
United States Atomic Energy Commission, Oak Ridge, Tennessee. 

Mr. Bernard E. Phillips has resigned his position as teacher of mathematics 
in the Newark, New Jersey high schools to do statistical work for the Glenn L. 
Martin Co , Baltimore, Maryland 

Dr. W. R. Van Voorhis, Associate Professor of Mathematics, Fenn College, 
attended, as a representative of the Institute of Mathematical Statistics, the 
inauguration ceremonies of Dr Keith Glennan as President of Case Institute of 
Technology, Cleveland, Ohio. 


438 



NEWS AND NOTICES 


439 


Atomic Energy Commission Fellowships 

The National Research Council is announcing a new program of fellowships 
supported by funds provided by the Atomic Energy Commission as a part of the 
Commission’s responsibility for future atomic energy research. Accordingly, 
these fellowships will be awarded m the many fields of science related to research 
in atomic energy 

A considerable number of these fellowships is available to young men and 
women who wish to continue in graduate training or research for the doctorate 
in an appropriate field of science. Others of these fellowships will provide train¬ 
ing in biophysics applied to the control of radiation hazards. An additional 
number of fellowships will be assigned to those below the age of 35 who have 
already achieved the doctorate and who wish to secure advanced lesearch train¬ 
ing and experience in those aspects of the physical, biological and medical 
sciences related to atomic energy Tenure of the fellowship does not impose on 
the fellow any commitment with regard to subsequent employment. 

The candidates will be selected by the fellowship boards of the National Re¬ 
search Council established for this program In the postdoctoral field, there 
will be three groups of fellowships, the basic stipend of which will be $3000. For 
the selection of fellows for advanced research and training in the general field of 
the physical sciences, a boaid has been established with Dr. Roger Adams, 
Professor of Chemistry, University of Illinois, as chairman In the general 
field of the biological sciences, exclusive of the medical sciences, selections of 
postdoctoral fellows will be made by a board under the chairmanship of Dr. R. 
G. Gustavson, Chancellor of the University of Nebraska. For the selection of 
postdoctoral fellows in the medical sciences, a board has been set up under the 
chairmanship of Dr Homer W. Smith, Professor of Physiology, College of 
Medicine, New York University. 

The program provides for two groups of fellows in the predoctoral field, with 
stipends ranging from $1500-2100 One group of fellows will work in the bio¬ 
logical and basic medical sciences mcludmg applied biophysics related to the 
measurement and control of radiation hazards and the effect of radiation upon 
health. Selections will be made by a board under the chairmanship of Dr. 
Douglas Whitaker, Professor of Biology, and Dean of the School of Biological 
Sciences, Stanford University. Another group of predoctoral fellows will be 
selected for study and research m the general field of the physical sciences. 
Selections will be made by a board under the chairmanship of Dr. Henry A. 
Barton, Director of the American Institute of Physics. 

Fellowships will be granted for study and research m universities or other 
nonprofit research establishments approved by the fellowship boards Awards 
will be made for the academic year 1948-49. Supervision of a fellow’s program 
of work will be under the direction of the fellowship boards of the National 
Research Council. Further information can be secured by writing to the 
Fellowship Office, National Research Councjl, 2101 Constitution Avenue, Wash¬ 
ington 25, D. C. 



440 


NEWS AND NOTICES 


Research Fellowships in Psychometrics 

The Educational Testing Service, Princeton, N. J., is offering for 1949--50 its 
second senes of research fellowships in psychometrics leading to the Ph.D 
degree at Princeton University Open to men who aie acceptable to the Gradu¬ 
ate School of the University, the two fellowships carry a stipend of $2,200 a 
year and are normally renewable 

Fellows will be engaged in part-time research in the general area of psycho¬ 
logical measurement at the offices of the Educational Testing Service and will, 
in addition, carry a normal program of studies in the Graduate School. Com¬ 
petence in mathematics and psychology is a prerequisite for obtaining these 
fellowships. Information and application blanks may be obtained from: 
Director of Psychometric FeUowship Program, Educational Testing Service, 
Box 592, Princeton, N. J. 


Preliminary Actuarial Examinations 
Prize Awards 


The winners of the prize awards offered by the Actuarial Society of America 
and the American Institute of Actuaries to the nine undergraduates ranking 
highest on the combined score on Part 1 and Part 2 of the 1948 Preliminary 
Actuarial Examinations are as follows' 


First Prize of SHOO 
Edward H Larson 
Addihonal Prizes of SlOO 
John E Brownlee 
William L Farmer.. 
Joseph P. Fennell . . 
Bert F Green, Jr 

Solomon Leader. 

Felix A. E Piiani 
Richord J Semple 
Charles A Yardloy,,, 


Massachusetts Institute of Technology 

Haveiford College 
University of Alabama 
Princeton University 
Yale University 
Rutgers Umveisity 
University of Western Ontario 
University of Toronto 
Dartmouth College 


The two actuarial organizations have authorized a similar set of nine prize 
awards for the 1949 Exa m inations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exam¬ 
inations : 


Part t. Language Aptitude Examination. 

(Reading comprehension, meaning of words and word relationships, autonyms, and 
verbal reasoning,) 

Part 2 General Mathematics Examination. 

(Algebia, trigonometry, oooidmate geometry, differential and integral calculus.) 
Part 3 Special Mathematics Examination 
(Finite differences, probability and statistics.) 



NEWS AND NOTICES 


441 


The 1949 Preliminary actuarial Examinations will be administered by the 
Educational Testing Service at centers throughout the United States and 
Canada on May 13-14, 1949. The closmg date for applications is March 15, 
1949 

Detailed information concerning the Exammations can be obtained from either 
of the following organizations: 

American Institute of Actuaries, 

135 South LaSalle Street, 

Chicago 3, Illinois. 

The Actuarial Society of America, 

393 Seventh Avenue, 

New York 1, New York. 


New Members 

The following persons have been elected to membership m the Institute 
(March 1 to May 31, 1948) 

Alder, Arthur, Ph.D (Univ. of Berne) Professor of Actuarial Science, University of Berne, 
Sohlaeflistrasse 2, Berne, Switzerland. 

Andrews, Fred C., B S. (Univ of Washington) Research Fellow, Department of Mathe¬ 
matics, University of Washington, H-l Savery Hall, University of Washington, Seattle, 
Washington. 

Archer, John, Actuary, Pensions Section, Lever Biothers and Unilevci Ltd , SA Spencer 
Hill, Wimbledon, S, W 19, England 

Benitz, Paul A., M.A (Stanford Univ ) 17$ Serpentine Road, Tenafly, New Jersey. 

Bennett, George K., PhD (Yale) President of the Psychological Corporation, 522 Fifth 
Avenue, New Yoik 18, New York. 

Berrettoni, J. N., Ph.D (Univ of Minnesota) Professional Consultant in Statistics and 
Economics, 63$ Erie St, S. E , Minneapolis 1.^, Minnesota. 

Birnbaum, Allan, A.B. (Univ. of Calif., Los Angeles) Teaching Assistant, Mathematical 
Statistics Department, Columbia University, 500 Riverside Drive, Boom ^$4, New 
York $7, New Yoih 

Blank, Mark, M A, (Univ of Pennsylvania) Instructor of Philosophy, University of Penn¬ 
sylvania, E Sedgwick, Philadelphia, Pa 

Blumen, Isadora, B A. (Univ. of Minn ) Student, Department of Mathematical Statistics, 
University of North Carolina, Chapel Hill, North Carolina 

Burdick, Wayne E., M A. (Umv of Mich ) Student, University of Michigan, SI 4 S Fifth 
Avenue, Ann Arbor, Michigan 

Chaturvedi, Jagdish C., M.Sc, (Agra Umv , India) Lecturer in Statistics, St John’s College, 
37, Delhi Gate, Agra, U P , India. 

Cote, Louis J., A.M (Univ of Mich ) Student, University of Michigan, 315 North State 
Street, Ann Aiboi, Michigan. 

Dunleavy, Mary, A B (Hunter College, New York) Statistician, E I. Dupont do Nemours, 
657 Second Avenue, New York 16, New York 

Ferber, Robert, M.A (Umv of Chicago) Student, University of Chicago, 54 West 89th Street, 
New York 24, New York 

Forman, John W., M S (Umv of Iowa) Graduate Assistant, Depaitment of Mathematics. 
State University of Iowa, Iowa City, Iowa. 



442 


NEWS AND NOTICES 


Franklin, Nathan M., M.S, (Univ, of Mich.) Student, Univ. of Michigan, Box 196, Moodua 
ConnecticuL 

Fraser, Donald A. S., M A. (Univ. of Toronto) Instructor in Statistics, Graduate College 
Princeton, New Jersey 

Grabowski, Edwin F., A.B. (George Washington Univ.) Student, George Washington Uni¬ 
versity, ISSO-SOlh Street, N-W., WasMngion, D. C. 

Healy, William C., Jr., B.S.E. (Univ of Mich.) Student, University of Michigan, B89 Lin¬ 
coln, Oroaae Pointe, Michigan. 

Heimdahl, Olaf E. W., A.B. (Luther College, Washington) Teaching Fellow, Department of 
Mathematics, University of Washington, J 1 S 86 Union Bay Lane, Seattle E, Waahinglon. 

Henriksen, Robert 0., B.Sc. (Univ. of Mach ) Student, Umvoisity of Michigan, 7 S 1 Clancy 
Avenue, Grand Rapids, Michigan. 

Howard, William G., B.S. (Western Carolina Teachers College, Cullowhee, N C.) Student, 
Institute of Statistics, University of North Carolina, Route 1 , MoTriaville, North 
Carolina. 

Irick, Paul E., M S. (Purdue Univ.) Mathematics Instructor, Purdue University, 799 
North Grant St., West Lafayette, Indiana 

Johnson, Elgy S., M.A (Univ. of Mich.) Student, University of Michigan, 1S907 Lincoln 
Street, Detroit S, Michigan. 

Kaplan, E. L., B S. (Carnegie Inst of Tech.) Mathematician, Naval Ordnance Laboratory, 
llf97 N. St., N. W., Washington 6 , D. C. 

Kaufman, Arthur, M.A. (Columbia Univ.) Student andLectuier of Mathematics, Columbia 
University, 18S0 Sheridan Avenue, New YoikSO, New York. 

Link, Richard F., B.S. (Univ of Oregon) 7B0 W Sixth St., Eugene, Oregon, 

Marks, Charles L., M.A (Umv of North Caiolina) Instructor of Mathematics, University 
of North Carolina, SIS Mangum Dormitory, University of Not Ih Carolina, Chapel Hill, 
North Carolina 

Marguardt, Mary, M.A. (Univ. of Illinois) Assistant Professor of Statistics, New York State 
School of Industrial and Labor Relations, Cornell University, Ithaca, New York. 

Mickey, Max R,, Jr., B.S (Virginia Polytechnic Institute) Graduate Student and Graduate 
Assistant, Department of Mathematics, Iowa State College, 706 Ash Avenue, Ames, 
Iowa. 

Mmdlin, Albert, B A, (Univ, of California, Los Angeles) Teaching Assistant, Mathematics 
Department, University of California, £444 Carlslon Street, Berkeley 4 , California. 

Morns, William S., A B (Princeton) Statistician, First Boston Corporation, 100 Broadway, 
New York 5, New York 

Netzorg, Morton J., Engineer, Development Tire Engineering Department, U S Rubber 
Co , Detioit, Michigan, £S£S Gladstone, Detroit 6, Michigan 

Norton, James A., Jr., A.B. (Antioch College) Graduate Research Assistant, Veterans 
Guidance Center, Purdue University, West Lafayette, Indiana. 

Perrin, John K., A B. (Columbia College) Assistant Statistician, American Telephone & 
Telegraph Co., 195 Broadway, New York 7, New York. 

Perry, Norman C., M.A (Univ of Southern Calif.) Lectuier in Mathematics, Mathematics 
Department, University of Southern California, Los Angeles, California, 

Powell, Charles Jr,, Actuary, Coates and Hevfurth, Consulting Actuaries, 116 S Virgil 
Avenue, Los Angeles 4, California. 

Raifla, Howard, B.S. (Univ. of Mich ) Student, Umvoisity of Michigan, I 44 I Enfield Court, 
Willow Run Village, Michigan. 

Raup, Joan E., B A (Barnard College) Statistical Analyst, Bureau of the Budget, 1438 N 
Street, N W., Washington 5, D. C. 

Rubinstem, David, B S. (Univ. of Wash ) Research Assistant, Statistical Laboratory, Uni¬ 
versity of California, £218 Parker Street, Berkeley 4, Califot nia 



NEWS AND NOTICES 


443 


Schlenz, John W., B S. (Baldwin-Wallace College) Student, University of Michigan, 8S06 
Vineyard Avenue, Cleveland 5, Oho. 

Scott, Elizabeth L., A B (Umv of California) Research Assistant, Statistical Laboratory, 
Department of Mathematics, University of California, Berkeley 4, California 

Seidman, Herbert, B.A. (Brooklyn College) Jumor Statistician, Chief, Statistical Informa¬ 
tion Section, New York University and Student, New York University, S170 New 
York Avenue, Brooklyn 10, New York, 

Shaw, Oliver A., B A, (Umv. of Mississippi) U.S Air Force, ^S1 Brook Lane, N W., 
Washington, D, G 

Shellard, Gordon D., B.S. (Mass Institute of Tech) Assistant Section Head, Underwriting 
Studies Section, Actuarial Division, Metropolitan Life Insurance Co., 0 Mountain 
Avenue, Ridgewood, Nm Jersey. 

Shepherd, Clarence M., M.S (Case Institute of Tech) Electrochemical Research Chemist, 
SOSO Nichols Avenue, S. W, Washington, B. C 

Shrikhande, Sharad-Chandra S., B Sc. (Nagpur Umv, India) Graduate student, Depart¬ 
ment of Mathematical Statistics, University of North Carolina, Chapel Hill, North 
Carolina 

Sirlin, Robert, M.A. (Columbia Univ.) Statistician, Financial Analysis, 20^8 East 2Sri 
Street, Brooklyn 20, New York. 

Stacy, Edney W., A B. (Umv. of North Carolina) Instructor of Mathematics, University 
of North Carolina, SOI W. Franklin Street, Chapel Hill, North Carolina 

Sternhell, Charles M., B.S (College of City of N. Y.) Section Head, Actuarial Division, 
Metropolitan Life Insurance Co, 1 Madison Avenue, New York City, New York 

Tang, Pei-Ching, PhD. (Umv. College, London Umv.) Professor, National Central Uni¬ 
versity, Nanking, China 

Whitson, Milo E., A.M. (Geo Peabody College, Nashville) Head of Mathematics Depart¬ 
ment, California State Polytechnic College, S2S Lawrence Dr , San Dais Obispo, 
California 

Watson, Geoffrey S., B.A. (Umv. of Melbourne) Student, Institute of Statistics, State 
College, Raleigh, North Carolina 

Woolsey, Theodore D., B.A (Yale Umv.) Biostatistician, Division of Public Health Meth¬ 
ods, U S Public Health Service, 111 Fest Underwood St , Chevy Chase 15, Maryland. 

Wymer, John P., M A (Univ of California, Berkeley) Statistician, U S Bureau of Labor 
Statistics, 719 Whittier St,NW, Washington 12, D C 

Yerushalmy, Jacob, Ph.D, (Johns Hopkins Univ.) Professor of Biostatistics, School of 
Public Health, University of California, Berkeley 4, California. 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The Thirty-fourth Meeting and the Third Regional West Coast Meeting of 
the Institute of Mathematical Statistics was held on the Berkeley Campus of the 
University of California June 22nd through June 2Jth, 1948, in conjunction 
with the Twenty-nuith Annual Moetmg of the Pacific Division of the American 
Association for the Advancement of Science. During the meeting 115 persons 
registered, including the following member,s of the Institute- 

G. A, Baker, Blau- M Bennett, Carl A Bennotl, Z. Wm. Bunljauni, David Blackwell, 
AlbertH, Bowkci, George W Brown, A. George Carlton, Douglas G. Chapman, Andiew-G. 
Clark, Edwin L Ciow, Dorothy Ciudeu, Harold Dnvm, K C. Davis, W J. Di\on, Roboit 
Dorfman, George Eldiedge, Lillian Elvehack, Mary Elveback, Benjamin Epstein, M, W. 
Eudey, Evelyn Fiv, Morrill M Flood, H JI. Gormond, Moynr A. Girahick, Eugene L, 
Grant, John Guiland,T F, Harris, J,L Hodges, Jr.,Paul G TIo('l,JohnM Howell, Harry 
M. Hughes, Leo Katz, H. ,B, Komjn, T. G Kooimians, George W Kuznets, E L. Lehmann, 
IliohaidF. Link, A M Mood, Burnley W. Nash, J Neyman, Stefan Peters, G. Baley Price, 
Kathryn B Holfe, Leonard J Savage, Henry SchelTA Ilnwaul L Sehug, Elizabeth L, 
Scott, Esther Scideii, Milloiv Solid, Zonon Szatvnwaki, John W. Tukey, J. II. Vatnadal, 
A Wald, John E Walsh, Holbrook Working, Zivia S. Wurtelc. 

The Tuesday morning session was devoted to a symposium on Mathematical 
Theory of Games with Professor G. C. Evans of the University of California, 
Berkeley, as chairman. Addrcsso.s were- 

1. Survey of von Neumann’s malhcmatical theory of yames. J. C. C. MoKinsey, Project 
Rand. 

2 Recent devclopncnts in the mathematical theory of games. Olaf Hulmer, Project Rand, 

, 3 Applications of theory of games to statistics Abraham Wald, Columbia University 
4 On continuous games Henri E, Bohnenblust, Galitoinm Institute of Technology 
6 Discussion, Edward W. Barankin, Univeisity of California, Berkeley 

The attendance was approximately 130. 

The Tuesday afternoon session was under the chairmanship of Professor Honri 
F Bohnenblust of the California Institute of Technology. The invited addiess. 
Complete Classes of Statistical DccAsion Fumtions, by Professor Abraham Wald 
was followed by tea m Senior Women’s Hall and then the following contnbuted 
papers: 

1. Identification as a problem of inference, T. C. Koopinans, Cowles Commission for 
Research in Eooiiomioa 

Discussion' Olav Rcierspl, University of Oslo 
2 Estimation of parameleis for liuncaled multmormal distributions Z W. Bunbaum, 
E. Paulson and F. 0. Andrews, University of Washington. 

3. A tost of the hypothesis that a sample of three came from the same normal distribution. 
Carl A Bennett, Gonoial Electric Company. 

4. A Note on the application of the abbreviated Doolittle solution to nonorthogonal analysis 
of variance and covariance, (By title.) Carl A Bennett, General Electric Company, 

The attendance was between 100 and 150 during the afternoon 

444 



REPORT ON BEBICBLEY MEETING 


445 


The Wednesday morning session was devoted to a symposuim on Design of 
Experiments with Particular Reference to Agncultmal Trials Dean A. R 
Davis oI the University of California, Berkeley, presided briefly and then 
Professor Abraham Wald took over the duties of chaiiman The papers were' 

1 Recent advances in ex-penmenlal design. It C Bose, University of Calcutta. 

2 Yield Inals with backet ossed deiived lines of wheal G A Baker and P N Briggs, 
University of California at Davia 

3, Releciing subset which includes the largest of a numbei of means Charles Stein, Uni¬ 
versity of California, Berkeley 

4, Discussion A. G Clark, Colorado State College, S W. Nash, Umveisity of Cali¬ 
fornia, Berkeley, J R Vatnsdal, State College of Washington 

5 The effect of inbreeding on height at withers in a herd of Jersey cattle W C. Rollins, 

S W Mead and W. M Regan, University of California at Davis 

Attendance was about 100. 

The afternoon session, under the chairmanship of Professor George P61ya of 
Stanford University, began with an invited address by Professor Michel Lofeve, 
University of California, Berkeley, on Random Functions and Related Problems 
This Avas followed by the contributed papers: 

1. An example of a singular conimuous dislnbulion (By title ) Henry Schoffd, Uni¬ 
versity of California at Los Angeles. 

2 On the theory of some nonparamelric hypotheses E, L Lehmann and Charles Stem, 
University of California, Bcikoley. 

3. Compound randomization in the binary system John E. Walsh, Project Rand 
4 A multiple decision pioblem aiising in the analysis of variance Edward Paulson, 
University of Washington, 

6 Recurrence formulae for the moments and seminvarianis of the joint distribution of the 
sample mean and vanance Olav Rcierspl, University of Oslo 

0, Idenlification problem in factor analysis (By title ) Olav Reierspl, University of 
Oslo 

7. On distinct hypoLhescs. Mrs. Agnes Berger, Columbia University 

The attendance was approximately 100. 

A symposium on Sampling for Industrial Use occupied the Thursday morning 
session. Professor Z. W Blrnbaum of the University of Washington presided. 

1 Sampling plans for continuous production. M A Girshick, Pioject Rand. 

2 Sampling plans with continuous variables foi acceptance inspection A L Bowker, 
Stanford University 

3, Place of stahstical sampling in the education of engineers E I- Grant, Stanford 

Umveisity i □ 

4. Discussion. Henry Soheffd, Umveisity of California at Los Angeles; Charles Stem, 
University of California, Berkeley; Holbrook Working, Stanford University. 

The attendance was approximately 100. 

The first part of the afternoon session, presided over by Professor W. J. Dixon, 
University of Oregon, was devoted to contributed papers: 

1 Statistical problems of medical diagnosis. Jerry Neyman, University of California, 
Berkeley. 

Discussion^ L. J. Savage, University of Chicago 



446 


BEPOBT ON BEBKEliEY MEETING 


2 Power of certain tests relating to medical diagnosis. C, L, Chiang and J. L. liodgoa, 
Univorsity of California, Berkeley. 

3. On best asymptotically normal estimates Edward W Baranlcin and John Gurland, 
University of California, Berkeley. 

4 Iterative treatment of continuous birth processes. T, E Harris, Project Rand 
6, Estimation of means on the basis of preliminary tests of significance Blair M Bennett, 
University of California, Berkeley. (By title.) 

The attendance was about 90 

The second part o£ the afternoon session was the Business Meeting. Professor 
Abraham Wald, President of the Institute, presided. It was recommended that 
two meetings a year be held on the West Coast, one in June in the San Francisco 
Bay Area alternating between Berkeley and Stanford and the other during the 
winter alternating between the North and Los Angeles. The next West Coast 
meeting will be held during the Thanksgiving recess at Seattle. 



THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 

(roowimt ET H, o. dawter) jS-— 

Thb Ofstciaii Joubbal or rEm / 

or MATi(s»(A<noAir SrAnffncs I /. 

, , . ^ ; V, /k.e, 

ConievUs 

FAOV 

Testing Gompoiind Symmetiiy in a Normai Multivaiiatti IMstribu- 
tKm. D,P,VoTAnir,jn..... ,-•->* __ ■ 

Branching Prooemm, T. E. lUmaB.;-... 47A 

Moat Powerful . Tceitn of dcrniiKEstn Hypothec I. liTiariUiit IHa; 
iributiona.. E. L. Lskgkakw Ajn> (|. Snutr.^. 405 

Bjhnbolio AfiatFjx Dm^ivnlavM. Paul E Dwyhh Ako M. St 

l^uAiXi t > .. j - fc^ liw ^ 1^ ii»1I- 617 ‘ 

On tho lAmjiting Diatrfli<ii£k»H BatiinaW Boa^ Un 

IlfeiiiB TJifiWinfaw!' WuaAA* Vt. . 1 

A Non-Param^tric In^^liehtienoc. . Hicrtwrom^ . ^*5 , , 

'On Ptedicil^i iu Bfwtibn^ Tiro II»a^ O, A. WoiA,. StSS 

.OotihraliiiaititHi.to -iV.'DiinflBWi<>to' of'InoOiM^itt® ijf'Tf^byebef?; ’ ;, 

" .T[^jieit»." Owtr^ *'♦ *.■ 

I BdoAclaiiai ,of^Siw A 

PLA€ljtSrtY''« iiij t t »,«* t I ,• li i ■■ j. ?'>< I- '>t S7jS. ^ 

,v ' , ■ ^ ,' ,''''’ ', "■ „' ' {'' - ,■’ '’ 

''-'iiv'' -/ < 

'!■ tif .\ l '>• ‘ ', *’'1 yk ' 'v/ ,v' ‘'V^'' 

■' V’HrvvAf ''{ 









THE ANNALS 

OF MATHEMATICAL STATISTICS 


XOl'mi BT 

a a. WUJKB, Em>r 

U. 8. lURTCBTT HAttALD CnEtAMfiR 

WILUAM a COCHRAN W. iS 0 WJM%I >8 DRMlNQ 
ALLEN T. GRAHS J. L- DOOB 

C.C. CRAIG W. FBUJSB 

HAROLD HOTELUNG 


J. NEYMAN 

A-HlWHAisr 
TDKBy 

A. WAXiD 


T. V. AsBaKBOH^ Jb. 

Davib Ht^innmd, 
i.m GonitaB 

EUwt>« E> I^ioDaB 
jpAPb 8. DmrMR 


tracB TRB coOraustw or. 

Cammmvt. EiasKirAiiir 
M. A. QiwaisK 
pAOb It Bai>u6k 
PAvii O. Som. 

a L. ^dCBUAIlN 
Wiluak.G, IrfABcnr 


H. B. Mabm 
AiiKAinHia M. Mood 
RKsanBfOE MontSKunt 

EaimrScHianiA 
jAciQB WouRnrm 


Tte A|tfKAi« <»r foATSmos is pubfished (tuarted^ by tin 

iMo^benaatjcal Mi, Bo;^ & QniUbrd 4PeB., BofUmow 9, ■ 

I^ji munnib, otdam (or baid: naixd^i(B4°t^l^ 

be sKVit to am MAvRBMAViaAx. QmAansmm, Mt, 

Boyal A JEbtItItnfore 2, Mdi, or to ihci SeisrBtaiy' of (bB;IoBti '7 

toM 1)1 B B. jysrfmff 116 Ba<ilctoia tlidtetBitgr 

' <^fWD^ JA 'otl#a^ sddr^ adslch ate to faeoonie efhsi^ve' W 

ib tbfi j^retary ait .or bc^qm the 

;.Alpbl^':inrqiii(^^ »o(ontti;<(^>',tJui^-iai9m. . ^^IV;.iAaBihB;.^:Se6da^Aro..M^ 

'. ^ ‘I 

' MirrteBKAracAX. SrA'aftisi^' 

!B« B'niiQet!qq/:{^ ' MABoqc^iec 


Jk he''4i%y/k^k' 








TESTING COMPOUND SYMMETRY IN A NORMAL 
MULTIVARIATE DISTRIBUTION 


By David F. Votaw, Jr. 

Yale University 

Summary. In this paper test criteria are developed for testing, hypotheses 
of “compound symmetry" in a normal multivariate population of i variates 
(i > 3) on basis of samples. A feature common to the twelve hypotheses con¬ 
sidered IS that the set of t variates is partitioned into mutually exclusive subsets 
of variates. In regard to the partitioning, the twelve hypotheses can be divided 
into two contrasting but very similar types, and the six in one type can be paired 
off in a natural way with the six in the other type. Three of the hypotheses 
within a given type are associated with the case of a single sample and moreovei 
are simple modifications of one another, the remaining three are direct extensions 
of the first three, respectively, to the case of k samples (fc > 2) The gist of any 
of the hypotheses is indicated in the following statement of one, denoted by 
Hi{mvc ): within each subset of variates the means are equal, the variances are equal 
and the covariances are equal and between any two distinct subsets the covariances 
are equal. 

The twelve sample criteria for testing the hypotheses are developed by the 
Neyman-Pearson likelihood-ratio method. The following results are obtained 
for each criterion (assuming that the respective null hypotheses are true) for 
any admissible partition of the t variates into subsets and for any sample size, 
N, for which the criterion’s distribution exists: (i) the exact moments, (ii) an 
identification of the exact distribution as the distribution of a product of inde¬ 
pendent beta variates, (lii) the approximate distribution for large N. Exact 
distributions of the single-sample criteria are given explicitly for special values 
of t and special partitionings. 

Certain psychometric and medical research problems in which hypotheses of 
compound symmetry are relevant are discussed in section 1 Sections 2-6 give 
statements of the hypotheses and an illustration, for Hi(mvc), of the technique 
of obtaining the moments and identifymg the distributions. Results for the 
other criteria are given in sections 7-8. Illustrative examples showing appli¬ 
cations of the results are given in section 9. 

1. Introduction. In studying psychological examinations, or other measuring 
devices, one may wish to test whether several forms of an examination may be 
used interchangeably. Consider the case of three forms, and assume that 
scores of individuals on the three forms have a normal 3-variate distribution 
The hypothesis of interchangeability is equivalent to the hypothesis that in the 
normal distribution the means are equal, the variances are equal, and the covari¬ 
ances are equal. When this hypothesis is true, the normal distribution is in- 

447 



DAVID F. VOTAV, JH. 


•i'JS 

variant over all pcrniutations of the vanates and is said to have complete sym¬ 
metry. It is frequently more important, however, not only to test that the forms 
have completely symnieLric relations with each other but also that they are inter¬ 
changeable with regard to their relation to some outside criterion measure (e g., 
the criterion might be skill in a given task), A.ssuming that the scores of in¬ 
dividuals on the thiee forms and the criterion have a normal 4-variate distribu¬ 
tion, the hypothesis of interchangeability is equivalent to the hypothesis of 
equality of means, equality of variances, and equality of covariances among the 
three forms and equality of covariances between forms and criterion. When 
this hypothesis is true, the 4-variatc normal distribution is invariant over all 
permutations of the three variates (associated with forms) among themselves, 
and so the variance-covariance matrix has the following form: 


A 

C 

C 

C 

G 

B 

D 

D 

C 

D 

B 

D 

C 

D 

D 

B 


where the quantity A represents the variance of the criterion measure. A 
normal distribution for which this hypothesis is true is said to have compound 
symmetry (of type I). A more general case of compound symmetry (of type I) 
arises when there are several examinations (no two of which need have the same 
number of forms) and several outside criteria 
The hypothesis of complete symmetry may arise in certain medical-research 
problems. For example, suppose a measurement (o.g., %C02 in blood) is made 
at each of three times (say T\, Tt, Tj) on an experimental animal and assume 
that the three quantities have a normal 3-variate distribution; one may then be 
interested in testmg the hypothesis of complete symmetry on basis of measure¬ 
ments (considered as a random sample) made on several experimental animals 
More generally, let there be two characteristics, say U and W (e.g., %GOi in 
blood and %02 in blood), which are both measured at each of two times, Ti , 
Ti . Let it he assumed that the four quantities—^ivhich we represent as VTi, 
UTi, WTi, WTi —-have a normal 4-variate distribution. One may then be 
interested in testing the hypothesis that the means of the first two variates are 
equal, the means of the second two are equal, and the variance-covananoe matrix 
has the form: 


E 

F 

K 

L 

F 

E 

L 

K 

K 

L 

G 

J 

L 

K 

J 

G 


When this hypothesis is true, the 4-variate distribution is said to have compound 
symmetry (of type II). A more general case of compound symmetry (of type II) 
arises Avhen there are h characteristics and n times i}i,n = 2, 3, ■ • ■)• 






COMPOUND SYMMETRY 


449 

Either of the two typ^es of compound symmetry is a direct extension of complete 
symmetry. Wilks [5] has thoroughly treated the sampling theory of certain 
criteiia for testing various hypotheses of complete symmetry regarding a normal 
multivariate distribution. 

The problems dealt with in this paper are: (i) to give sample criteria for 
testing hypotheses of compound symmetry regarding a normal multivariate 
distribution, and (ii) to give the moments and identify the distribution of each 
sample criterion when the corresponding hypothesis is true 

The hypotheses are stated in section 2 Certain properties of compound sym¬ 
metric normal distributions are given in section 3 Sections 4, 5, and 6 together 
give the method of deriving each sample criterion and the methods of obtaining 
the criterion’s moments and identifying its distribution; the methods are illus¬ 
trated for one of the hypotheses Sections 7-8 give the other criteria and their 
moments together with approximate distributions of the criteria for large sample 
sizes Exact distributions of some of the criteria are given in section 7g for 
certain special compound symmetries Section 9 contains two illustrative 
examples 

2. Statements of hypotheses. Let II be a normal i-variate population and 
Xi (i = 1, • ' ,t) (t > 3) be the t-th variate in n. Let the set of variates Xi, 
Zj, • • • , Xi be partitioned into q mutually exclusive subsets of which, say, 
b subsets contain exactly one variate each and the remaining q — b = h subsets 
(where h > 1 ) contain n\, , nh, variates, respectively, where ria > 2 

h 

(a = 1, ■ ■ •, h; b -f 2 = t). No generality is lost in assuming that the t 

asi 

variates are ordered so that the first b belong to the b subsets containing one 
variate each, the next ni variates belong to the (h -f l)-th subset, • • • , the lastii;, 
variates to the 5 -th subset, where ni < nz < • • • < % . Let (1 , ni, riu, ■ • , 74 ) 
represent such a partition of the variates Xi, • • , X, into subsets; when b = 0 
the term 1*’ will be omitted. The notation can be simphfied when rii, ni, • • , 
Uh are not all unequal, e g,, (l‘, 2 , 2 ) can be written as (I**, 2 ^). 

In the statement of each of the following six hypotheses it is assumed that there 
is a preassigned partition (I*", , tij , • • • , nO of the t variates into q subsets 

(S = & + h). 

(1) Hi(mvc): The hypothesis that within each subset of variates the means 
are equal, the variances are equal, and the covariances are equal and that be¬ 
tween any two distinct subsets of vanates the covariances are equal. 

(2) Hiivc): The hypothesis that within each subset of variates the variances 
are equal and the covariances are equal and that between any two distinct sub¬ 
sets of vanates the covariances aie equal. 

(3) Hi{m ); The hypothesis that within each subset of vanates the means are 
equal, given that Hi(-uc) is true. 

(4) Hk(MVC I mvc): the hypothesis that k normal i-variate distributions are 
the same given that they all satisfy Hi{vivc) for a particular division of the van¬ 
ates into subsets (/c > 2 ). 



450 


DAVID F. VOTAW, JH. 


(5) Ih{VC I ?H(,'c): The hypothesis that h normal /-variate distributions have 
the same variance-covariance matrix, given that they all satisfy Hi[mvc) for a 
particular division of the variates into subsets (k > 2 ), 

( 6 ) I mVC): The hypothesis that k normal /-variate distributions are 
the same, given that they all satisfy Hi(mvc) for a particular division of the 
variates into subsets and that they all have the same variance-covariance matrix 
(/c > 2). 

Any of the hypotheses stated above can be expressed in terms of an invariance 
condition on the noimal /-variate distribution (or distributions), e,g, Hi{mvc) 
is equivalent to the hypothesis that the distribution is invariant over all permuta¬ 
tions of the variates within subsets The pattern of symmetry present in the 
vanance-covariance matrix of the distribution rvhen any of the above six hypoth¬ 
eses is true is given in section 3 (see (3 2)). 

Six additional hypotheses, Ri(mvc), Hi(vc), • • ■ , Hi{M \ mVC), which are 
Riodifications of Hiimvc), IIlive), •• • , HkiM | mVC), respectively, will also be 
considered. In regaid to any of these six R hypotheses, it is assumed that there 
is a partition (?i^)(n = 2, 3, • ) of the / variates (/ = nh) and that in each subset 

the variates are m a given order; thus each subset has n variates and between 
any two distinct subsets of variates there aren* covariances, which form smnXn 
“block” m the variance-covariance matrix of the distribution (see (3 4)). The 
hypotheses may now be stated as follows; 

Ri(mvc)‘ The hypothesis that within each subset of variates the means are 
equal, the variances aie equal, and the covariances are equal and that between 
any two distinct subsets of variates the diagonal covariances are equal and the 
off-diagonal covariances are equal, 

Ri{vc): The hypothesis that within each subset of variates the variances are 
equal and the covariances are equal and that between any two distinct subsets 
of variates the diagonal covariances are equal and the off-diagonal covariances are 
equal. 

The statement of any of the hypotheses Riim), RiiMVC \ mvc), RkiVC \ mvc), 
and RkiM \ mVC) is obtained from the statement of the corresponding H 
hypothesis by simply substituting R for H. The pattern of symmetry present 
in the vanance-covariance matrix of the distribution ivhen any of the six R 
hypotheses is true is given in section 3 (see (3.4)), from which the appropriate 
invariance condition on the normal distribution can be obtained. 

A test of any of the hypotheses Riimvc), Hi{vc), Riivc), Hiim), Riim) 

is based on a random sample from a normal multivariate distribution; a test of 
any of the remaining hypotheses is based on k random samples from k normal 
multivariate distributions, respectively, (k > 2) 

A normal distribution for which m H ot R hypothesis is true will be called 
compound symmetric. In the special case where the compound symmetry holds 
for a partition (t) of the i variates, any H hypothesis and the R hypothesis 
coriesponding to it are identical; in this case the normal distribution will he 
called completely symmetric. Problems (i) and (li) (see section 1 ) have been 



COMPOUND SYMMETBY 


451 


solved completely by Wilks [5] for Hi(mvc), Hi{vc), and Hi(m) foi the case of 
complete symmetry, 

3. Block symmetric matrices and vectors. Let be the mean value of X, 
and 11 pijCTjo-j' II be the variance-covaiiance matrix of Xi, • , X; (i, j = 1, ■ ■ , t) 

(p.j IS the coefficient of correlation between X, and Xj), The joint probability 
density function^ of Xi, X 2 , • , X( is 

(3 1 ) J{X^ ,X„ , X,) = I (7., exp [-E (?.;(X. - m.)(XL - m^)], 

^4 3 

where ll G., |1 is po.sitive definite and its inverse || (?‘^ || = || 2 Puo-.o-j ||. 

When any of the H hypotheses is true (see section 2), we represent |1 G'^ || 
by II A'^ II (also || G.j || by ||X.j || ) which can be written as (3.2) (see page 452), 
where X**' = A''* (s, s' = 1, • ,h) and 4)“' = D'^'^a, a' = 1, ■ • • , /i, a ^ a'). 
The A’s and B s ivith single superscripts and the C’s and D’b have been intro¬ 
duced to indicate the block pallein clearly In general (7““ = only if 
a = s(s = 1 , ■ • , 5; ct = 1 , , /i). ||A„ || and || A’'|| have the same 

block pattern of symmetry. 

The blocks in (3 2) are formed by making a partition (l’’, iii, ns, , nj,) of the 

i lows and t columns of || A’' || A matrix having the block pattern of sym¬ 
metry of (3.2) will be called block symmetric of type I. Clearly a block symmetric 
matrix of type I is invariant over all permutations of its rows and columns within 
the subsets determined by (l‘, ni, • , nn), if the row interchanges and column 

interchanges are the same Also, a f-component vector will be called block 
symmetric if the order of values of the components is mvaiiant over all permuta¬ 
tions of the components within groups determined by (1 , ni, • , n/.). 

The determinant of the block symmetric matrix || A,; || is 

(3,3) |A.J = 7^n(4«-5„)’’“-\ 

where 





c'n 

C'n 

C'n, 


As,' 


C'n 

C'n • 

C'n 

C'n 

C'n ■ 

c',. 

A't 

D'n ■ 

D'n 

On 

C'n 

C',2 

D'n 

a ; • 

D'u 

Cih 

C'u ■ 

C',K 

D'n 

D'n 

■ K 


' In general a chance quantity and the variable of its distribution function will be de¬ 
noted by the same symbol. 





452 


DAVID P VOTAW, JR 


•-I e4 

Cj o • 

:S i 

• O j 

j; ^ 

q C) 

-ft 

■ ‘ q 

3 

q q 

ci 

• ‘ q 


-ft .ft 

q q • • • 


-< •c ‘ 

■ 

^ i 

■ ; 

qq • 

• • q 

q q 

Cl 

■ q 


-ft < 

q -^ • . . 


"ob • 

-'i 1 

>Q ' 

■ O I 

< 

Qq ■ 

-ft 

■ • q 

q q • 

.ci 

M 

• ‘ q 


.ft JS 

q ■ • ■ 



J 

; 

■ 

• 

2 S 

Cj O • 

Cl 

« Cl 

qq • 

Cl 

■ ■ q 

d e-i 

q q • 

w 

• ■ 


Cl Cl 

qq 

Cl 

rft 

q 

2 S ■ 

Cj O • 

■ • ^ 

M Cl 

M 

q 

M ffl 

q 

Cl 

• q 


M 

.ft rfi 

q q ■ ■ 

Cl 

.ft 

q 

2 fl 

O O ■ 

‘ -1:^ 

M M 

qq • 

M 

rH 

■ ■? 

M . C« 

q • 

• • q 


M Cl 

.ft -ft 

q q • ■ 

Cl 

q 

*5j * 

■ -tD 

qq . 

• < 

1 

Cl fl 

q q • 

c» 

q 


• • 

q 

\:i ■ 

■ •% 

q q; . 

- • q 

w 5t 

q q • 

Cl 

• q 


qli : • 

5 

q 

S ?i 

O O ■ 

• 

■^q • 

• • q 

Cl Cl 

q q • 

• -"q 


< < 
q q • 

q 

O •£) 

i-r ei 

^ ■ 


O o 

. 5 

o 

Cl Cl 

A o 

o o 

• 'll 


bb ■ • 

b 

M • 

1-1 w 

2 

W M 

O O ■ 

?l 

o 

Cl Cl 

Cl Cl 

o o 

Cl 

d 

q 


-ft .ft 

^ . . 

q q 

ft 

• b 

'=^ ^ ■ 

s 

■ ■ 

q tj ■ 

' o 

d Cl 

M fH 

q q ■ 

d 

• ■ "b 


-ft -ft 

bb • • 

ft 

• b 




(M 

CO 












COMPOUND SYMMETRY 


453 


where Ch„ = C,a Vn ^, + (ji„ - l)B ^, and D'„- = Daw V UaUw 

^ sw ~a^' ' ‘ ’ ^ I ^-nd Dan'are the cofac¬ 

tors oj .4“ , C““, A“, and D““', respectively in (3.2). The ellipsoid, defined by 
/l,j(A, — m,)(Xj — nij) = ro (tq fixed and > 0), has (ua — 1) axes of equal 
length (a = 1, ■ ■ , h), and each of the remaining q axes is inclined to the co- 
oidinate axes so that its direction cosines have the same block symmetry as the 
set of diagonal elements in (3 2 ). 

When any of the H hypotheses is true, we represent || G*' || by 1| || (also 

II Gtj II by II A.j ID which can be written as 


(3 4) || 4 u|| = 


A' 

s’ ■ 

S’ 

c’^ 


^12 


c’'* 

s”* 

s’'* 


A’ . 

S’ 

S’^ 

. 

S’' 


S’'* 

c’'* 

S’'* 


S’ ■ 

■ A’ 

S’" 

S’“ 

c’' 


S’'* 

5”* . 

. C”* 


S^’ . 

S^i 

A” 

S' • 

. S' 


C'* 

S''* • 

S''*. 

^21 

• 

. S^’ 


A' • 

• S' 


s''* 

C''* • 

. s«> 



C^’ 


S' • 

• A' 


S''* 

s''* • 

. C''* 

■ 

' 




jjhi 

■ S'*’ 


S''* • 

. S''* 


A" 

S'* ■ 

.. S'* 


Qhl 

. S'*’ 

S^'“ 

c'^ 

S''* 


£'* 

A'* ■ 

• • s^ 

S'" 

S'*’ • 

■ C'*’ 

jyh 

S''* 



S'* 

S' ■ 

■ ■ 


where the blocks are formed by a partition (n^) of the t rows and t columns; thus 
each block is an n X n array H A*’ || and 1! Aij |1 have the same block pattern 
of symmetry. 

A matrix having the block pattern of symmetiy of (3.4) will be called block 
symmetric of type II. The determinant of 1| A,, || is 

A. 1 = 


(3.5) 









454 


DAVID F. VOTAW, Jli. 


where 


-II — ^12 — 

O2I — -^21 -*^2 — -^2 


Cih — Dy, 
C 2 I, ^ 2/1 


“ i^hl ^A2 — 5*2 • • Alt ~ 5* j , 

AI C12 • ‘ ‘ Gik 

C 2 I A 2 ••• C 2 ;, 


1 Chi Chi ■ • Ah 1 , 

where Aa — A^ + {n — 1)S<, and Cla- = Car,' + (n ~ l)5„o, (a, a' - 
1 , 2, • , h; a yi a'), Aa , Sa, Caa' , and 5„„. are tlie cofactors of A°, E°, C""', 
5’°", respectively, in (3.4). 


4. Method of obtaining the sample criteria. The probability distiilnition, 
P, of a simple, random sample, say Oy(Xia , Xia , ■ , X,a)(a: = 1, 2, • • ■ , iV), 

from n IS 

(4.1) P = G.i exp [-E G.AX.a - m,)(X,a - m,)]. 

%,1,a 

For 0,f fixed, P is the likelihood function of the parameters mi, m 2 , • , mt, 

and G,j (i, j = 1, 2 , ■ • , i). To obtain sample criteria for testing the /P and 5 
hypotheses we shall employ the Neyman-Pearson likelihood-ratio method. The 
details of applying this method will be given for only one of the hyiiotheses, since 
the technique of application is the same for all the hypotheses under 
consideration. 

In applying the likelihood-ratio method we maximize P under two different 
sets of conditions and form the ratio of the two maxima To derive a criterion 
for, say, Hiivivc), we first maximize P over the set, n, of admissible values of the 
parameters in (4 1), secondly, we maximize P over the set, w, of admissible values 
of the parameters in (4.1) that satisfy Hi{mvc) Let Pa and Pu be these maxima, 
respectively. The likelihood-ratio criterion for Hi{mvc) is Mimve) = Pu/Pa i 
thus 0 < h{mvc) < 1 The sample criterion, Li{vivc), for Hi{mvc) will bo chosen 
as a single-valned function of Xi(miic) 


4a. Denvalion of the criterion Li{vwc). The parameter spaces, S 2 , and, a, can 
be specified as follows: 

^ |( 1 ) IlCr.', II positive definite, 

1(2) - =0 < m, < -h «> (i = 1, 2, • • • , f), 

( 1 ) II II positive defimte and block symmetric (of type I); 

( 2 ) - M < m, < -f 00^ (;ni, ?n2, • • , wi,) block symmetric. 


0) 



COMPOUND STfMMETET 


455 


The block symmetries in w(l) and ci)(2) are for the same partition (!*’, ni, ■ ■ ■ , nu) 
of the t variates (see sections 2 and 3). 

Maximizing P is equivalent to maximizing 


(4.2) 


L = In P = -{Ni/2) In IT + iN/2) In | G„ \ 

y ) Gii(Xnt vti)(^X^a 


Solving the simultaneous equations dL/dm^ = 0{i = I, ■ • • , t) and dLjdGi, = 
0 (i) i = 1, • • • j ^ i) for have 


(4 3) 


A = (1/iV) E Z.« = X.. 

a»X 

(N/2)&^ = £ (X.„ - X.)(X,„ - X,) = v„ , 
01 = 1 


substituting these values of the parameters into (4.1) we find that 
(4 4) Pfi = I I exp [-^ 2 ]. 

In (4.3) and (4 5) each expression at the extreme right is defined by the corre¬ 
sponding expressions at the left 

In w(2) there sneb + h groups of means, the means within a group being all 
equal; let m's be the s-th mean and bo the common value of the means in the 
( 1 ) _|- a).th group. Solving the simultaneous equations dL/dm, = 0, 

aL/am', = 0, dL/dA,., = 0, dL/dC.a = 0, BL/dA, = 0, BL/BB, = 0, 

dL/8D,,. = 0(5, s' = 1, ■ • , 5, o, a' = 1, , h; a ^ a'), we find that 

(A. 51 — Xs, 

^ ^ = ( 1 /Xn.) E = Xt., 

a.^<^ 

{N/2)A“ = E {Xia ~ Xj)(Xa/a X,') = Pss' , 

a«al 

(X/2)C'“ = (l/nj E (X,« - 1,)(X,„ - X'J = u,a, 

a ta 

(JV/ 2 )i‘‘ = ( 1 /w.) E (X.„„ - Xr,f = v'a, 

a iVg 

iN/2)B‘‘ = [l/no(na — 1)1 E (X,.a — XrJ(X,„« — XrJ = Wa, 

Oi^a'Jn 

{N/2)D‘“'' = (l/Uana') E (X,,a — XrJ(X,„Ja “ X^,) = Zaa’ , 

o.^aiVo' ' 

where ta , == 6 “H + Ij j ^ J 5*^ Ja J 

ni = 0 ; a, a' = 1 , ■ ■ , ?i; n 5^ 

When Hiimvc) is true, the maximum likelihood estimates of m,, o-,, and 
= 1 , ... , t) would be obtained by means of (4.5) and the definition of 

II id*' II given just after (3 1). 

Substituting the expressions in (4.5) into (4.1) we find that 
14 6 ) P. = I W" 

where 



456 


DAVID F. VOTAW, JD, 


-c -c 

V W • 

?3 

.e ■ 

• V .e 1 

s 1 

tc: 

^ H V ri * 

Ivl U 

•e; 1 •«! i^i 

• ■*• 1H t ** Cl ** Cl * 

W 1 tl W 

■ 

t 

1 

>< ' 

• -s M 1 

M ' 

1 

1 

J 

1 

1 

1 

1 

* ■*. Mj 

S s • 

oi i 

K lA 1 
• ^ 1 

•« •«; 

1 

1 

Ji 1 rf; < 

.. i-t 1 *. PI*. 04 

* * J W • 

1 

1 

1 

( 

1 

1 

. . « 1 

M j 

1 

1 ^ ^ ^ 

1 g s ■ 

1 

V JS 

■ g 

«s H -s M 

■ i 

•< •« 

i-J ' 

4 •C^ < 

, ** rt 1 *». C4 V 04 , 

M « la M 

1 

1 

MS > 

. .**01 

M ; 

1 

1 

1 ?a ?3 • 

j 

• g 

1 i 5 I 

' ' .1.1 
! } 1 1 * 

■s *-Cs N 

3 3 

Cl 

• *. f« 

Cl Cl 

*» *- r1 • 

tQ W 

. . ... S i - « 

M 1 5 ^ ‘ 

t 

1 

1 

1 

M 1 

* 1 

t 

» 

1 M M 

1 

1 

! • • 

. . ..s 

M 

s s ■ 

Cl 

1. •£> 

• ? 

PI PI 

S l-l > 1^ 

II M 

1 

1 

, . .a 

M 1 § ^ • 

1 

) 

1 

1 

1 

1 

*. Cl 1 

Zi ' 

. ♦ e- 

1 

1 

1 

1 

1 d ot 

1 ** <5S Id 

j I'l W 

. 

N (M 

S iH *s M 

s @ • 

Cl 
•s ua 

s 

PI ^ 

, .s £i ' • 

• , ,0 ?5 

1 

• • 's' 1 

1 

1 

1 

1 Cl d 

1 -1 *5 ^ ic: 

i U M 

1 

■*. s 

’ IQ 

»-* H 

i-l«h « 

S 3 • 

s 2 

f • •* 

c- 


1 

. 1 »* Cl s S « 

* ~ • tl u 

1 

• 

1 

1 

1 

1 

** PI ' 

I 

1 

1 

1 

1 

« 

1 

[ V <•*, S * 

i M tl 

1 

1 

1 

1 

• • **• iC 

M 

iH 

V 1-1*.. « , 

S S 

. ■- s 

's'S . 

* 1 

1 

t 

^ jr* 1 

w •*. ClCl , 

«" < U 1-4 

1 

1 

j 

1 

. Cl J 

U 1 

1 

1 

! ^ Sv 5 

1 1? ■ 

1 

H 

‘ ' M 

l—i H 

S 1-1 *s M 

s s • 

1-1 

**1 lO 

'■s"! ■ 

1 

‘'j:* \ M -1 

• • 1 V «IS M 

1 M U 

1 

. . ^ ! 

M ! 

1 

1 

1 *. JdV 5 . 

1 ti w 

1 


A >o 

S' £' 

. . s 

•s 10*1. Ic • 
-0 -0 

5- P- 

1 

>-» 1 Cl C4 

. • *. ^^ ; V .0*. rO » 

PIS? 

1 

1 

1 

1 

PI ' 

• N. .O 1 

■*3 1 

C- 1 

1 

1 

1 

! -< i! 

fc. .o *. lO • 

1 s ? 

1 

1 

1 

MS 

*S 10 

3 


M 

. . «o 

• • ?s 

1-1 1-1 
•*. PI S. ei 
*0 *«.< 

?*■ p- 

1 

■-I ! PI d 

*s PI V Cl V Cl 
• • *.3 S '<0 •*» a 

. C-* 1 S*» p- 

1 

1 

1 

1 

1 

C 4 I 
•*■ d 1 
i *«> I 

C 1 

1 

1 

1 

1 

1 

1 *< -« 

1 ** d >• d 

1 * 

1 

MS 
■»• d 

■ 3 

^ H 

^ ^ • 

. . s 

1—( 1.^ 

*. iH *w 1-1 

S 5 • 

i-« 1 Cl C4 

• • 's Ts's • 

- s 1 
s 1 

1 

I 

1 MS JS 

1-3's ■ 

1 

** 1—1 

• ■ 3 















COMPOUND SYMMETRY 


457 


From (4.4) and (4.6) it follows that the likelihood-ratio criterion for Hiivivc) is' 
(4 8 ) \iimvc) = [ 1 I / 1 I {i,j = 1, ,t) 

Finally, as the sample criterion for Hi{mve) we choose 
(4 9) Liimvc) = [\i(mvc)f'”'' = [ | w,, j / j r'.j ] ] 


Ah. Preliminary calculations for evaluation of moments of Li{mvc) The deter¬ 
minant I v[} I in (4.9) is block symmetric From (3.2), (3 3), and (4 9) it follows 
that' 


(4.10) 

where 


Li(mvc) = 


n u - 


I // ri 

I ^'tt' ' ) 


n 

V^gf — Vgt,* j 

VsTa ~ 

v'ura = + (ria ~ 1 ) m 1 ; 

^r^ra' “ , 


(s, s' = 1, ■ ■ , b; r, r' = 1, ,h + h] r^ = b + a-, a = 1, ■ ■ ,h) 

N 

Let = Z.. - m. and = (1/iV) Z (^ = 1, " ,0 Clearly 

^ t 

= Z (y<<« — Yv)(,Y,„ — Y,). When Hi{mvc) is true, u,a ,Va,Wa, and Zaa ', 

awl 

in L\{mvc), can be expressed exactly as they are expressed in (4 5) with Y sub¬ 
stituted for X, and {vl — w'a) and v'rr' in (4 10) can be expressed as follows: 

v'a - w'a = {l/Ua) { Z “ [l/(«a " D] Z 

+ (N/nf) Z Y\, - [N/n.{na - Dl Z i 

ft 

Vggf “ Vggf j 

tjVr„ = (l/Vn^) Z*'«.a, 

*0 

‘^foTa ~ (l/^a) Z ) 

*0.70 

^'rlra' = (l/Vnaria') Z J'.a?, • 

From (4 10) and (4.11) it follows that when Hi(mvc) is true, each element of the 
determinants on which Li{mvc) depends consists of: (a) a quadratic form in Y, 
and a linear function of the i;,, , or (b) merely a linear function of the 

Vij (*7 7 ~ I 7 ■ ' 7 l)- 

The j oint probability density function of the w,, and Y . is 
(4.12) fiv^J)giYl , ■ ■ , Y,), 



458 


DAVID T. VOTAW, JR. 


whore 

\ I 1 1 GXp [ 2^ G-ijVx^ 

("^) r {^) ■ ■ r (^)' 

(I I (?„ 11 positive definite; iV > t), which is the Wisliart distribution [ 9 , p. 120 ], and 
p(7i , • . ■ ,Yd = I G,i exp [-AT Z G., ?v ?,] = p(F), say, 

* J 

which is a normal f-vanatc distribution. The d-th moment (d = 0, 1, • • ) 
of Li{mvc), when Hi.(mvc) is true, is 


(4.13) 


E[Uimvc)f = f f(v.MY)lv,,f\v':.. 

J R 




where the domain, R, of integration is — oo < Y, < + » 11 a,, 11 positive somi- 
definite (i, j = 1, ■■ ,t). The integral in (4.13) is evaluated in section G (by 
means of Wilks’ moment-generating operator.s) for the case whore 
IS true. 


6 . Remarks on Wilks’ Moment-Generating Operators. Wilks’ operators 
are applicable to a far wider class of problems than those treated in this paper 
The following discussion is confined to a special use of the operators. 

From (4.12) it follows that 


(5 1) 




I v^j 1"^"' exp [- 2 G„v„] n dv,j 
_M 

i-l 






Avhere R' is the region in the space of Vtj for which 11 r,, 11 is positive definite, and 
II Gtj II is positive semi-definite (Of course, the piobability that || || is not 

positive definite is 0.)^ Let O'i, = j = 1, •••,«); if all the are 

sufficiently small, || Gi, || is positive definite, and we have 

I exp [- 23 O.jV.,] II dv,i 

I Q ^ |(JV-l)/3 /____ i^J _ 

(6-2) ” ir''‘-»'^rir[(W-z)/2] 

t -1 

_ I p |C:/-I)/2 I i-Cw-D/s 

— I I I I > 

which is B(g), where g = exp [-Z) /J.ji'ej. 

Let lij be an operator (whose operand is a function of all the /3„) which repre¬ 
sents the following set of operations: (a) replacement of each / 3 ,y in the operand 



COMPOUND SYMMETKY 


459 


by 5., + , (b) mtegration (of theiesultof (a)) withrespectto = 1, ■ • , i) 

from — 00 to + “ , (c) multiplication of the result of (a) and (b) by From 

(3.1) it follows that 

(5.3) lUg) = I\, (exp [- Z) = g | v,i (|| || pos. def.); 


and if all the are set equal to 0 after performing the J-operations, then = 1 
and (5.3) yields j !!„■ | Let 1,^ be \ Irepetitions (X = 1, 2, ■ • • ) of ■ 
Clearly, 


(5.4) 


MlUff)] = Mg r'l k,=o = ri. 


Under all conditions of their use in this paper the I operations are interchangeable 
with the E operation [8; p. 316]; thus, 

E[l\,g] = 7^,[S(ff)] 


From (5 2), (5.4), and [8, pp. 318-320] we have 


s[| V., r^'“] = I (?., 


|(jr-i )/2 


{i:, I g:, 


I |-(V-l)/2i 


k,-o 


(5 5) 


= IG, 


|^/2 


Jim - f, -X], 


where N >i + \ + 1 and \p{R, 8) = T j ^ • 

The operator may be used, as indicated above, to find negative half-integer 
moments of ] ] To obtain positive half-integer moments of 1 1 ;,, | we may 

use an inverse operator [8, pp. 321-323] (X = 1, 2, • ) which has been 

defined in such a way that 


(5.6) 


1 G„ 




g:,. 




^[1 r'^j 

= IG., (n m - \ X]). 


The equality between the second and third expressions m (5.6) can be obtained 
from (5.1) by replacing iV by + X (see [7]). 

In (5.5) and (5.6) the p's are not necessary; however, in (4 13) and in similar 
expressions for the moments of the other criteria there are several determinants, 
each determinant requires a distinct /-operator, and it is of great convenience to 
introduce a distinct set of P’s for each I operator. The P's associated with a 
given operator may initially appear m more than one of the determinants in the 
operand. The order in which several /-operators are used is illustrated in the 
following case for two: 

(5.7) [i\, I g:,. r‘'(ir; I g:; r^") m,] k,=o. 

where X, p > 0 and the values of ¥ and ¥' are such that the value of the expres¬ 
sion is well defined. The notation in (5 7) means that /,]’ is applied to ] G,, | , 

the P's associated with IT, are set equal to zero, then l), is applied to the product 



460 


DAVID F. VOTAW, JR. 


of 1 G[j and the results of the previous operations, and then the / 3 ’s associated 
with 7i, are set equal to zero. The interchangeability of the order of I opera¬ 
tions is discussed in [ 8 , p. 324]. 


6 . The moments and distribution of Li (mvc) when Hi (mve) is true. To 
evaluate (4.13) we let 

(6.1) CJ = exp [- X) - X - «!) - X v'lr]. 

I.J a T,T> 

From (4.11) and (4.12) we have 

(6 2) E{c,) = iA.,r 

where 


a'.,. 

= 

Aai* 

+ 

ft.' + d'/.', 



= 


+ 

^ata ^9ral'\/'^ay 



= 

Aa 

+ 

Plata "1“ Pal'^a ~\~ PtaTa/'^O-y 



= 


+ 

^iaia A/(w<i l)na ”{" ^raVa/^a 


= 

Baa' 

' + 

PtaJa' PraTa'/'\/'^^a'f^a') 

(a 7^ 

A.iit 

= 

Aes> 

} 



a'' 

xlata 

- 

C.a: 

1 




= 

Aa 

+ 

Pa/'^^a ) 


a" 

= 

Ba 

~ 

Pa/^la(j^a ' l), {la Ja)> 


A" . 

= 

Baa' 

' 3 

(a 7^ a'). 



When Hi{mvc) is true, we have 


l?[L:(mi;c)]'' = I 




a=l 


-(W- 1 / 2 )', 

Jb,/-o 



(6 3) 


= {nV'(iV - f, 2 d)||n^(W + 2d-r, ~2d)j 
X (nm + 2d)(nu - 1), - 2d(na - l)]}|n(na - l)‘''”““'j, 


(d - 0, 1, 2, ■ ■ ■ ; W > i), 

where g = h + h and S) is defined in (5 5). In (6.3) the assumption that 
Hi{mvc) is true implies that after we apply 77 )*'* and set the equal to 0 all 
remaining determinants are block symmetric; we may then use (3 3 ) before, 



COMPOUND SYMMETRY 


461 


applying and {a = I, ■ ■ , h) The expression in (0 3) may be 

written as follows' 


(6.4) 


E[Li{mvc)Y 

'N - i 


t r 

" - i 


+ d 




^ ( Njn. - 1 ) 

n_V_ 2 _ 


(Uin. - 1 )^'’““-'= 


h Ua —1 

=nn 

a“l «ae=l 


{{— q — Sa — Ua + a — ] 

If, (Sa - D 

+ 


2 ' (jla - l)Jd 


where Ro is defined in (4 5) and (T)d = r(r + d)/r(r). 

We now consider the problem of identifying from (6 4) the distribution of 
Liimvc) (when Ili(mvc) is true). Let fl be a beta variate, i.e , a variate whose 
c.df, F{d), is 


(6.5) Fid) = IciP, Q), (0 < e < 1; P, Q > 0), 

which is the Incomplete Beta Function ratio. Jo(P, Q) is tabulated in [1] 
and [3]. The d-th moment of 0 is: 


( 6 . 6 ) 


r(P + d) r(P + Q) 
r(P) r(P + Q + d) 


(P)d/(P + Q)d, 


(d = 0, 1, " • ). Let 

a 

(6.7) r = n^j (c=l, 2, •), 

i-i 

Avhere the 0j(i = 1, •' , c) are mutually independent and each 6, is a beta 
variate, having parameters p,, q,, say The d-th moment of r is 


( 6 . 8 ) 


Eirf = n {p})i/{p, + q,)d , (d = 0, 1, • ) 

I-l 


Given a variate, say m (0 ^ P ^ 1)) 'V'diose d-th moment (d — 0, 1, ■ ) is given 
by (6.8) we can infer by means of the solution of the Hausdorlf problem of mo¬ 
ments that M and t have the same exact probability distribution function (see 
Corollary 1.1 [2, p. 11]). It should be noted that (6 4) can be written as 

h Ta-1 

E[Liimvo)f = nn [(Pfl3o)d/ {p asa + (Zasjd] ) 

0=1 «n=l 


(6.9) 



462 


DAVID F. VOTAAV, JR. 


where 


pa.a = [(iV - (? - s, - 7!. 4- a - l)/2] > 0, 


(Zasa 


(sfl ~ 1) I g + s, 


L(n. - 1) 


+ 


+ ng — 
2 


+ 


l' 


> 0; 


thus (6.4) IS a special case of (6.8). 

The exact probability (density) function, say gir), of r has been obtained by 
Wilks [7, p. 475] and is: 

g(r) - (1 - T)f“-’"-‘ ■ f\r'vr' ■ 

Jo Jo 

X (1 - yi)‘’‘'‘“'’'-''‘(l - ■ (1 - 


(6.10) X [1 - «i(i - r)r‘~^’-" [1 - (vi + v,(i - i»i)) (1 - ■ ■. 


X [1 ~ {wi + ^2(1 — Wl) + • + Uc-l(l — t'l)(l ~ II2) • ■ (1 ~ l’c-2)} 


c—1 

X n dW/, 
,-1 



(1 - r)’’”-'-'"'-*"] 


f; = 2 (p=-j' + Qi-r), 

,'~a 


Vi — H fo-i'- An approximation of the distribution of a product of inde- 
0 

pendent beta variates by the distribution of a single beta variate is given in [4]. 

The results of this section may be summarized as follows: Ij Hi(mvc) ts true, 
the d-lh moment (d = Q, ],••■) of the exact d%stribuUon of Li(mvc) is given hj 
(6.4). Also, if Hiimvc) is true, the exact distribution of Li(rmic) is given by 
(6.10), where the p,, g,, and c can be specified by means of (6.4). The cumula¬ 
tive distribution of Li{mvc) is given for certain special cases in section 7g. 


7. Single Sample Criteria. The solutions of problems (i) and (ii) (see section 
1) for Hi(mvc) are contained in (4,9) and the summary at the end of section 6. 
In the present section solutions of problems (i) and (ii) are given for each of the 
remaining two Hi hypotheses and the three Hi hypotheses (all of which are stated 
in section 2). For any of the hypotheses the sample criterion is chosen as a 
single-valued function of the likelihood-ratio criterion for the hypothesis. The 
methods of determining the moments and identifying the distribution of each 
sample criterion (when the corresponding null hypothesis is true) are entirely 
similar to those used in sections 4, 5, and 6 in regard to Hi(mvc). Section 7g 
gives the exact distributions of the single-sample criteria for certain special 
compound symmetries. 

Each criterion discussed in this section is based on a sample 
0y(Zi„,X2,, ,Z„)(a = I, ,W,W > f) 



COMPOUND SYMMETRY 


463 


of size N from a normal i-variate distribution (i = 3, 4, ■ ). As in the case of 
Hi{mvc), it is presupposed for testing Hi{vc) or that there is a certain 

partition (l\ Wi, , ■ ■ ■ , m) of the f-variates, for testing Hi{mvc), Hi{vc), 

or 7fi(m) it is presupposed that there is a certain partition {n) of the t variates 
(see sections 2 and 3). 


7a. The test Li(vc) for the hypothesis Hi{vc). For the sample criterion for 
Hiivc) we choose 

(7 1) Li(vc) = [\i(vc)]-"' = 1 I / \v,, 1, (b J = 1. • ■ I i) 


where Ai(dc) is the likelihood-ratio criterion for Hi(vc), is defined in (4.3), and 


i)gs' — Vaa' j 

Ja 

^tata ~ (i/rla) J 

la 

~ [l/^o(^2.ff l)] 23 

(Sj — 1, * , hj a, “ 1, • * * , hj a 5^ a j to, , ^'a) Ja “ "h flo ”1" 1, ‘ ' j 

h + ria+i ,na = ni+ • • + n„_i; fii = 0). Since 1| i),, |1 is a block symmetric 
matrix, there is an expression for 1 di, \ that is entirely similar in form to the 
expression in (3.3) for | A,, j (see also (4.9) and (4.10)). 

If Hi{vc) is true, 


E[L,(vc)Y ^ In'I'iN - i, 2d) 


(7 2) 


\n^[{N - 1 + 2d)(no - 1), -2d(?ia - 1)] 

Xlllm -r + 2d, -2d]) |n (tIo - 


f/iV — g — So — Tto + a — l \ 

^TtJ\ _2-(d = 0,l, 


= nn 


(N-l , (Sa- 1 ) \ 

I V 2 (?ta - 1) /<; 


where q = b + hand ip(R, S), Ha and {T)d are defined in (5 5), (4.5), and (6.4), 
respectively. From (7.2) and the argument given after (6 8) it follows that 
if Hi{vc) is true, the exact distribution of Li(yc) is given by (6 10), where the 
Pj , g,, and c can be specified by means of (7.2). 



4G4 


DAVID F, VOTAW, JR. 


7b. The test Li(m) for the hypothesis For the sample criterion for 

Hiim) we choose 


(7 3) 


Li(m) = 



(h.7 = l, ■■,1), 


where Xi(m) is the likelihood-ratio criterion for ili(m) and v',j and v,, are defined 
in (4.7) and (7.1), respectively. In passing we note that 


(7.4) 


[Li(m)][Li(i;c)] = Li(mvc). 


If ffi(m) IS true, 

E[L,{ni)Y = n IfKfV - l)(n« - 1). 2a(n„ - 1)] 

a=>l 


(7.5) 


X 'PHria — l)(iV -k 2a), —2aina — 1)]} 

- 




77 — 1 Sa 


a=l 


, \ 2 Ha. ijd 


(d = 0, 1, ..). 


If Hi{m) is true, the exact distribution of Li(m) is given by (6.10), where the 
Pj , q, and c can be specified by means of (7.5). It follows from (7 5) that the 
exact distribution of Li(m), when Hi(m) is true, does not depend on b 


7c_ The test Li(mvc) for the hypothesis Ui(nwc). The sample criterion, Li{mvc), 
for Hi{mvc) (see section 2) is 

(7.6) Liimvc) = = | v., | / \v[, |, {i,j = 1, • • ,t) 

where %i{mvc) is the likelihood-ratio criterion for Hiiinvc), is defined in (4.3), 
and 


= (l/«) E (X,„„ - X'a)\ 

<x,U 



= [l/7i(n 

- 1)] E - X'a){Xi'a - X'a, 

), 


(ia 

^ h): 








-1 

= (1/n) 

E, {X,^a - X'a)(X,i.a - X:,), 






O' 








oc 

Ja “b 

n{a' - 

- a); a 

X a') 

_/ 

= [l/n{n 

- 1)] E, (X,„a - X:)(Z„p„ - i 

■'•0 /; 






(fC 

^Ja + 

n(a - 

- o'), a 

7^ a'): 

(a = 

1, ■ • ■ , h, 

ia , Ja , ha , ka = (» — 1)« + 1 , • • • 

, an', ha 

' = 'la 

+ n(a' 

- a). 


ha' 9^ -\r n{a' — a); a = 1, • • • , fV). j||| is a block symmetric matrix, 

of type II (see (3 4)), in which the blocks are formed by a partition {nf) (t = nh) 
of the rows and columns; there is an expression for \vi]\ that is entirely similar 
in form to the expression in (3 5) for | Aij \ 



COMPOUND SYMMETRY 


465 


If Hiimvc) is true, 


(7.7) 


E[L,(,mvc)r = (n - 1)'“'"'“^'[H ^ 2rf)j 

X |n V'[(A?' + 2d){n - ]) + 1 - a, -2d{n 


h n—1 

= nn 

a“l «*=1 


^ ~h~s-{n- l)(a - 1 )^ 


(f 


+ ~ ° 4 - 

^ 2(n - 1) 




(d = 0,l,--.). 

IS true, the exact distribution, of Li{mvc) is given by (6.10), where 
) 9 j and c can be specified by means of (7 7) 


Id. The tesi Li{vc) for the hypothesis Hi(vc). The sample criterion, Ldvc) for 
ni{vc) (see section 2) is 

^^■8) Uvc) = [Xi(iic)f= ho 1 / I 5,1 I (^, J = 1 , ... , t)^ 

where Xi(iic) is the hkehhood-ratio criterion for 2i{vc), v„ is defined in (4.3), and 

— (l/n) 2 , 

Ja 

Ku = ll/n(n - 1)] E 

~ (l/n.) 2 W/.ii', (fc(' == jo + n(a' — a); a o'), 

= [\/n{n — 1 )] 2 '»UK', {h[, 7^ jo + n(a' — o); o o'), 

i\heie the ranges of a, i),, , h^, are given in (7.6) There is an expression 

forJI 5,j I which is entirely similar in form to the expression in (3 5) for | A„ I 
If ffi(vc) is true, *’ 


r < 

n 


(7 9) 


E[L,(,vc)Y = (n - 1 II ^{N ~ i, 2d) 

~h+\ 

h 

r 

a*“l 


X < 2: ^[{N - 1 + 2d){n - 1) + 1 - 0 , -2d (n - 1)] 


h n—X 

= nn 

a-l 3^1 


l( 


N - h — s — (n — l)(a - 1) 


). 


/iV— 1. 1 — 0 .s — 1^ 

V 2 2(ii - 1) ^ 


id = 0, 1, ...). 


If Hi{vc) is true, the exact distribution of Li{vc) is given by (6.10), where the 
Pj I 9j and c can be specified by means of (7 9). 



460 


DAVID P. VOTAW, Jli 


7e. The test Li(m) for the hypothesis The sample criterion Li{m), 

for Slim) (see section 2) is 


( 710 ) 


him) = 


Liimvc) 

Liivc) 



where \iim) is the likelihood-ratio criterion for Htim) and 1| iiiy ([ and jj So |1 are 
given in (7.8) and (7.6), respectively. _ 

If Slim) is true, the d-th moment (d = 0, 1, • ■ •) of him) is 


(7.11) 


E[Liim)r = n n 







+ ^ 4- ^ ~ A 

2in — 1) n — 1/d 

1—0 . s — iN 

2in '- 1 ) n^l/i 


i 


id = 0,1, ■••) 

If Slim) is true, the exact distribution of him) is given by (6.10) where the 
Pj , q, and c can be specified by means of (7.11). 


7/ Relations among Liimvc), Liivc), and Liim) ond among Liimvc), Liivc), 
and Zi(m), Liimvc) is the product of Liivc) and Li(m) (see (7.4)); pioreover, 
when Hiimvc) is true, the d-th moment (d = 0, 1, • • • ) of Liimvc) equals the 
product of the d-th moments of Liivc) and Li(m) (see (6 4), (7.2), and (7.5)) 
From this result and the argument given after (6.8) it follows that when Hiimvc) 
is true, Liimvc) is the product of two independent chance quantities, namely, 
Liivc) and I/i(m). Similarly, when Siimvc) is true, Liimvc) is the product of 
two independent chance quantities, namely, Liivc) and Li(77i). 


7g. Exact distributions of single sample criteria in special cases. For a sample 
of size N and a partition (l^ ni, •• • , n/,) of the I variates of n (see section 2) 
let the cumulative distribution function (c d.f) of Liimvc), when Hiimvc) is 
true, be 

(7.12) Fiu I l'', Ki, ■ - • , Uh I N) = Prob {Liimvc) < u]; 

also, let Fiy | l””, ni, • ■ ■ , n^, | N) and Fiz | 1**, ni, ■ , \ N) be the c d f.’s 
of Liivc) and Liim) when Hiivc) and Hiim) are true, respectivelyLet 
Fiu I n’' I N), Fiy \ ri' \ N), and F(z | | iV') be the c.d.f.'s of Liimvc), Liivc), 

and Ziim) when Hiimvc), Siivc) and I?i(m) are true, respectively. 

It can be shoivn that 

Fiu 11\ 2 I hi) = h m -b- 2)/2, ih -f 2)/2], 

Fiu\l\Z\N) = I^-[N - b - 3,b + 3], 

Fiy 11^ 2 IJV) = I, m-b- 2)/2, (6 -f l)/2]. 



COMPOUND SYMMETHY 


467 


(7 13) ^ I ^ - 3. b 4- 2], 

^'(2 I l'’, ft I AT) = [(jv - l)(ft - l)/2, (ft - l)/2], [s' = 

A’(ft|2-|Ar) = /^. [i\r - 4. 3], 

F(y I 2- I m = [iV - 4, 2], 

F(z I I m = h. [(N - Din - 1) - 1, ft - 1], W = 

where D(P, Q) is defined in (6,5). 

Distributions of the criteria in certain cases where the normal distribution is 
completely symmetric (see section 2) are given in [5], 

7h Asymptotic distributions of the single sample criteria When the sample 
size, N, IS large, we may use a theorem [6] (see also [9, pp. 151-2]) concerning 
the approximate distribution of the likelihood-ratio criterion. Dor large N the 
distributions of the quantities —N In Liimvc), —N In Li(dc), and -N In Li(m) 
(when Hiimvc), Hiivc), and Hi{m), respectively, are true) are approximately 
chi-square distributions with (1/2) [t(< -f- 3) — b(b + 3) - hih + 5)] - hh, 
(1/2)[f(i -f 1) — bib -]- 1) - h{h -h 3)] — hb, and t — b — h degrees of free¬ 
dom, respectively. Also, for large N the _distributions of the quantities 
—N In Li(muc), —N In Li(dc), and —N In Li(?ft) (when Hiimvc), Hiivc), and 
Hiim), respectively, are true) are approximately chi-square distributions with 
[tit -f- 3)/2 — liQi -b 2)], [tit -f l)/2 — hQi -b 1)], and i — h degrees of freedom, 
respectively. 

8. /c-Sample Criteria. In this section solutions of problems (i) and (ii) (see 
section 1) are given for the three Hi and the three Hi hypotheses (all stated 
in section 2). 

A test of any of these hypotheses is based on k simple, random samples (/c > 2) 
from k compound-symmetric, normal i-variate distributions. The probability 
density function, Q, of the k samples, say, Oir^ip = 1, ■ ■ • , k-, N], > b -\- h) is 

(81) Q = [n 

X OXp [*“ Gx],p{Xiaj, 

If 

iN' = = 1, ,t), where is the aj,-th sample value of the 

p—i 

i-th variate in the p-th population (op = 1, • • • , Np), m,,p is the mean (expected 
value) of the f-th variate in the p-th population, and (1/2) || G„,p 1| ^ is the 
variance-covariance matrix of the variates in the p-th population (see (3 1)). 
Dor a given set of k samples Q is the likelihood function of the parameters 
G\j,p and m,,p ii, j 1, ' * , P ~ 1? > 




4G8 


DAVID F. VOTAW, JH. 


The six hypotheses under consideration (see section 2) can be restated in terms 
of Gij,'p and , e,g , H)i{MVC \ mvc) asserts that m;,! = m .,2 = ■ = m,,*, 

and ||Cri;,i|| = = ■■ = l|G',j,i|| given that for all p the vector 

(wi.p I ■ • • 1 is block symmetric and the matrix || || is block symmetric 

(of type I) for a preassigned pai-tition (I*', Ui, ■ ■■ , iih) of the t variates (see 
sections 2 and 3) 


8a Ea>pressions for the criteria. Let XiiMVC | rrwc), • • ■ , Xi(iW | mvc) repre¬ 
sent the likelihood-ratio criteria for the six hypotheses IhiMVC \ mvc), • ■ • , 
Rk{M I mVC) respectively, and let LkiMVC | mvc), • , LiiM \ mVC) be the 

sample criteria for the respective hypotheses. We choose the Lj. as follows: 

LiiMVC 1 mvc) = MMVC \ mvc)f, 

LkiVC 1 mrc) = \\kiMC | mvc)f , 

(8 2) LkiM I mVC) = MM \ mYC)f''' , 

_ jLkiMVC I , 

\ Lh{VC I mvc) j ’ 


the expressions for Li,{MVC \ mvc), Lk(VC | mvc), and LkiM \ mVC) are the same 
as those in (8,2) ivitli X;, replaced by X*,. The Xj, and X*. can be obtained explicitly 
by straightforward application of the likelihood-ratio method (see the paragraph 
preceding section 4a). 


8b Moments of the k-sample criteria The exact distribution of any of the 
/c-sample criteria, when the correisponding null hypothesis is true, is given in 
(6,10), where the quantities p, ,qj, and c can be specified by means of the moment 
expressions given below. The moments have been obtained by means of the 
operators discussed in Section 5. 

For each of the following six moment expressions the null hypothesis, cor¬ 
responding to the sample criterion involved, is assumed to be true: 





COMPOUND SYMMETRY 


469 


X 


'nn'-ra 

p=l fl=l \ 2 i 


+ 


« ^ 1) y 

N-pijla 1) / d ^ ^ 


fr"'n~7^ . (u' -1) \ 

I V2 N'(na - l)Jd J 

— fc -|- 1 — r'' 


E[Ll{M 1 viVOf = n 


E[Lk(MVC I mvc)f 


N' - r\ 

2 )a 

^ ?i JV' /i T ' 

nni - — + 


EiUVC 1 mvc)Y = 


iiiii V2 ^ 2NM - 1) N^n - l)/<i 

n"n"’ (^~ + 1-^ , \ ’ 

I tA u-ii \2 ^ 22^'(« - 1) N'{n - l)/a J 

n n ft f- - — + 

V2 2Np Np 


flfl (I - * ~ 


fljal t^sal 


2N' 


N' 


fft 4- + (^; -1) ^ 1 

iiiH \2 2Nj,{n - 1) _Np{n - l) Jd 1,. 

1 ^ ~" I ~ r 

I M Ak \2 ^ 2N'{n - 1) ^ N'(n - l)Ja J 


E[U{M\mVC)Y = ft 

a=l 


N' -k + l - a 


\ m 

where d = 0, 1, ■ ■ ■ and (T)d is defined in (6.4). 


8c. Comments on the criteria. By an argument similar to that used in section 7f 
it follows from (8 3) that when Hk(MVC \mvc) is true Lk(.MVC\mvc) is 
the product of two independently distributed chance quantities, namely, 
LkiVC 1 mvc) and [Lk{M \ mVC)f'. The same assertion holds true if we re¬ 
place each L by L and H by H. 

Exact distributions of the 7c-sample criteria, when the corresponding null 
hypotheses are true, can be obtained explicitly for special values of k and special 
compound symmetries, but owing to lack of space we shall not consider them 
in this paper. 



470 


DAVID F. VOTAW, JR. 


When the sample size N' :s large, the exact distributions of 

—In Lk{MVC I mvc), -In Lk{VC | mvc), —N' In Lk(M | mVC), 

-In LkiMVC I mvc), -In LkiVC | mvc), 

and ~N' In Lk(M | mVC) (if the corresponding null hypotheses, respectively, 
are true) are approximately chi-square distributions with 


ik-l) 


z ^ J 

(k - l)[bih + l)/2 + hh + h{h + 3)/2], 


q{k - 1), h{h + 2)(/c - 1), h{h -f !)(& — 1), and h{k — 1) degrees of freedom, 
respectively, 


9. Illustrative examples. The first of the following two examples^ illustrates 
the use of Li(myc), Li{vc), and Li(m) in a psychometrics experiment, the second 
example illustrates the use of Li(mvc), Li(vc), and Li(m) in a medical-research 
experiment (see section 1). 

Example 1. In an experiment to establish methods of obtaining reader 
rehability in regard to essay scoring, 126 examinees were given a three-part 
English Composition examination. Each part required that the examinee write 
an essay, and for each examinee four scores were obtained on the following four 
things, respectively; (1) the part-2 and part-3 essays together, (2) the original 
part-l essay, (3) a long-hand copy of the part-1 essay, (4) a carbon copy of the 
long-hand copy in (3). Scores were assigned by a group of “English Readers” 
using procedures designed to counterbalance certain experimental conditions 
The score on (1) serves as a criterion. The experimenter asks whether on the 
basis of the sample (of size 126) the quantities associated with (2), (3), and (4) 
can be considered as interchangeable among themselves and interchangeable 
with respect to their relation to the criterion (1). 

Let Xi, Xi, Xi, and Xi be the scores on (1), (2), (3), and (4), respectively 
It IS assumed that (Zi , Xi, X 3 , Xi) has a normal 4-variate distribution and 
that the set of scores (Xia , Xia j j -^4a) — 1, ■ • • , 126) obtained from 

the essays is a random sample of values of {Xi, Xi, X 3 , Xi). The following 
three questions will be considered (see section 2), where the grouping of the four 
variates is (1, 3): (a) Is the sample consistent with the hypothesis Hiimvc)? 
(b) Is the sample consistent with the hypothesis Hi{vc)? (c) Is the sample 
consistent with the hypothesis Hiim)? In the particular experiment under 
discussion (a) is the experimenter's question. 


^ Mr L R. Tucker (Educational Testing Service, Princeton, New Jersey) and Captain 
J. Allan Rafferty, M.D. (Air University School of Aviation Medicine, Randolph Field, 
Texas) kindly gave the author the data for Examples 1 and 2, respectively. 



COMPOUND SYMMETKY 


471 


The sample means and vaiiance-covanance matrix are as follows: 


Xi 

X 2 

X 3 

X 4 

77.8976 

20.9425 

23.4544 

18.0384 

20.9425 

25.0704 

12 4363 

11.7257 

23.4544 

12.4363 

28.2021 

9 2281 

18 0384 

11.7257 

9.2281 

22.7390 


Means 28 0556 14 9048 15.4841 14 4444 

This matrix is (1/126) 1| || (h 1 = !> • ■ • j 4) (see (4.3)). The sample criteria 
Li{mvc), Li(vc), and Li(m) will be used to answer questions (a), (b), and (o), 
respectively. The values of the criteria can be computed from the values of 
I Wij I , I JJij I ; and I V{j | (see (4.9), (7.1), (73)), where vij is given m (4.7),and 
Dij is given below (7.1). The 9^ 1 ^ j) are evaluated by simple averaging 
of certain elements iri || Py || . Both | v',j | and | y,, | have the block pattern 
of (3 2) and can be expressed in the simplified form of (3.3), where h = 1 and 
Ml = 3; the simplified form of | [ can also be obtained from (4.10) and (4.11) 

From the data above it is found that 

Li(mva) = \v,,\/\v[,\ = .9214, 

Li(yc) = I 1 / I I = .9568, 

Li(m) = \ v,,\ / \v[,\ = .9630. 

The second, fourth, and fifth formulas in (7.13) (for N — 126, h = 1 , n =?= 3) 
give the distributions of Liimvc), Li{vc), and Li(m), respectively (when the 
hypothesis with which the criterion is associated is true). By direct computa¬ 
tion with expressions for the Incomplete Beta Function ratios the per cent points 
corresponding to the observed values of Li(mvc), Li{vc), and Li{m) are found 
to be .26, .49, and 09, respectively. Thus at the 5% significance level the 
answer to any given one of the three questions (a), (b), (c) is yes. Critical 
values of Liimvc), Li(yc), and Li(m) for various significance levels can be ob¬ 
tained from [3] by interpolation. 

Example 2. In an experiment to study certain properties of the blood of 
asphyxiated dogs, the %C 02 and hematocrit of 10 asphyxiated dogs were meas¬ 
ured four minutes and seven minutes after asphyxiation Let Xj and Xs be 
%C 02 and hematocrit four minutes after asphyxiation, respectively, and X 2 
and Xi be %C 02 and hematocrit seven minutes after asphyxiation, respectively 
It is assumed that (Xi, X 2 , X 3 , Xi) has a normal 4-variate distribution and 
that the set of measurements (Xi^, X 2 a , X 30 ,, Xta) (a =? 1 , • , 10 ) obtained 

from the 10 dogs is a random sample of values of (Xi, X 2 , X 3 , X 4 ). The fol¬ 
lowing questions will be considered, where the grouping is (2^): (a) Is the sample 
consistent with the hypothesis Riimvc)} (b) Is the sample consistent with the 
hypothesis Bi(yc)? (c) Is the sample consistent with the hypothesis Hi(m)? 
In the particular experiment under discussion (a) is the experimenter’s question. 



472 


DAVID F. VOTAW, JR. 


The sample means and sums of squares and cross-products are as follows' 


Xi X2 Xs Xi 


294,916 

313.908 

-89 364 

-69 282 

313.908 

363 689 

-130,422 

-69 261 

-89,364 

-130.422 

210.356 

241.688 

-69.282 

-69.261 

241.G88 

515.789 


Means 50.780 53.590 41.180 43.890. 

This matrix is || w.j || (i j = 1, ■ • • , 4) (see (4.3)). The sample criteria Li('mvc), 
Li{vc), and Ei(m) will be used to answer questions (a), (b), and (c), respectively. 
The values of these criteria can be computed from the data above (see (7 6), 
(7 8), and (7 10)) and are found to be: 

Zy(mvc) = I I'll 1 / 1 I = 09107, 

Li{vc) = \vt,\/ \ vij \ == .3259, 

Ii(m) = 1 Vii 1 / I d!, 1 = .2794. 

The sixth, seventh, and eighth formulas in (7.13) (for iV = 10, n = 2) give the 
distributions of Li{mvc), I/i{vc), and Li{m), respectively (when the hypothesis 
with which the criterion is associated is true). From [1] it is found that the 
observed values of Ziimvc), Li{vc), and Li(m) correspond to the 1.2, 12.4, and 
0 per cent points, respectively, of the distributions referred to above. Thus 
at the 5% .significance level the answer to questions (a) and (c) is no and to (b) 
is yes. The critical values of Li{mvc), L\(vc), and Li{m) for various significance 
levels can be found from [3]. 

More than one of the sample criteria may be of interest in regard to a given 
sample (see [5] pp. 267-268). For example, in an experiment .such as that 
described in Example 1 suppose the answer to question (a) is no. The experi¬ 
menter might then consider question (b); if the answer is no, the inconsistency 
of the sample with Hiimvc) might be regarded as due to the variances or co- 
variances If the answer to (b) is yes, the experimenter might then consider (c); 
if the answer here is no, the inconsistency of the sample with Hi(mvc) might be 
regarded as due to the means. If, however, the answer here is yes, further study 
might be required to “explain” the inconsistency. 

10. Acknowledgement. The author wishes to express his gratitude to Pro¬ 
fessor S. S Wilks under whose direction this paper was written. The author 
also wishes to thank Professor J. W. Tukey for many valuable conversations on 
certain mathematical aspects of the problems and to thank Dr. W. E. Kappauf, 
J Allan Rafferty, M.D., and Mr, L. E. Tucker for assistance on applications 
of the test criteria 



COMPOUND SYMMETEY 


473 


REFERENCES 

[1] K pBAHSON, Tables of the Incomplete Beta Function, Cambiidge UnivorBity Press, 1032. 

[2] J A. Shohat and J. D, Tamarkin, The Pioblem of Moments, American Mathematical 

Society, 1943, pp 9-12 

[3] Catherine M TiioMrsoN, “Table of peicentago points of tho inooraplete beta func¬ 

tion,” Bzomelnka, Vol. 32, Part II (1941), pp 151-181 

[4] John W . Tukey and S S Wilks, "Approximation of the distribution of the product of 

beta variables by a single beta variable,” Annals of Math Stat., Vol 17 (1946), 
pp 318-324. 

[5] S S. Wilks, “Sample criteria for testing equality of means, equality of variances, 

and equality of covariances in a normal multivariate distribution,” Annals of 
Math. Slat , Vol 17 (1946), pp 257-281 

[6] S. S. Wilks, “The large-sample distribution of tho likelihood ratio for testing composite 

hypotheses,” Annals of Math Siat , Vol. 9 (1938), pp. 60-62, 

[7] S S Wilks, "Certain generalizations in the analysis of variance,” Biomeinka , Vol. 

24 (1932), pp 471-494. 

[8] S S. Wilks, “Moment-generating operators for determinants of pioduot moments in 

samples from a normal system,” Annals of Math , Vol 35 (1034), pp. 312-340. 

[9] S S. Wilks, Mathematical StaHshes, Princeton University Press, 1943 



BRANCHING PROCESSES' 


By T. E. Harms 

Project RAND, Doitglas Aircraft Com-pany 

1. Summary. This paper is concerned with a simple mathematical model 

for a branching stochastic process Using the language of family trees we may 
illustrate the process as follows, The probability that a man has exactly r 
sons is pr 1 r = 0, 1, 2, • ■ ■ . Each of his sons (who together make up the first 
generation) has the same probabilities of having a given number of sons of his 
own; the second generation have again the same piobabihties, and so on, Let 
Zn be the number of individuals in the ntli generation. We study the probability 
distribution of 2 „. Some previous results are given in section 2 , these include 
procedures for computing moments of z„, and a criterion for when the family 
has probability 1 of dying out In sections 3 and 4 the case is considered where 
the family ha,s a non-zero chance of surviving indefinitely In this case the 
random variables zJEz„ converge in probability to a random variable w with 
cumulative distribution G{u). It is shown that G{u) is absolutely continuous 
for M 5^ 0. Results of a Tauberian character are given for the behavior of Q[u) 
as u —> 0 and u ^ . In section 5 some examples are given where G{u) can 

be found explicitly; G{u) is computed numerically for the case pi = 0.4, pa = 0 6. 
In section G families with probability 1 of extinction arc considered. A method 
IS given for obtaining in certain cases an expansion for the moment-generating 
function of the number of generations before extinction occurs. In section 7 
maximum likelihood estimates are obtained for the Pr and for the expecta¬ 
tion Ezi, consistency in a certain sense is proved. In section 8 a brief discussion 
i.s given of the relation between two types of mathematical models for branching 
processes. 

2. Introduction. By a branching stochastic process is meant a phenomenon 
of the following general type: each of an initial aggregate of objects can give rise 
to more objects of the same or different types, the objects produced can then 
produce more, and the system develops, subject to certain probability laws. 
Examples are the development of human or animal populations, propagation of 
genes, and nuclear chain reactions. The mathematical model dealt with in this 
paper may be thought of as representing the generation-by-generation growth 
of a family, the fundamental random variable being the number of individuals 
in the nth generation. Under certain conditions, however, this model may 
describe the size of a family at a sequence of points in time. This question will 
be touched on in section 8. 


* Based on a doctoral dissertation presented to the Mathematics Department, Princeton 
Umveiaity, June, 1947. 


474 



BRANCHING PROCESSES 


■±75 


Definition 2 1 . The random variables Zn , n = 0, 1, 2, • • • , will be said to 

represent a simple discrete branching process provided: zq = 1 ; P(zi = r) = pr, 

00 

r = 0 , 1 , 2 , • • • , with Spr = 1 ; the conditional distribution of Zn+i i given 

r=0 

Zn = r, is that of the sum of r independent random variables, each having the 
same distribution as Zi. 

00 

Assumptions. Throughout this paper we assume that r'^Pr < that at 

t=0 

least two of the pr are positive, and that po + Pi < 1 . 

Definitions 2.2. Let x = Ezi = Srp,, a = Var (zi) = lirp, ~ x^. Let 
00 

/(s) = be the generating function of z\ (s denotes a complex variable), 

T=0 

oo 

Let p„r = P{zn = r) and/n(s) = XI of course pir = p, and/o(s) = s. The 

assumptions given above insure that the first and second derivatives f{s) and 
f"{s) are continuous in the set consisting of the interior of the unit circle and the 
point s = 1 ; thus derivative notations such as/"(I) are used even though/(s) 
may not be analytic at s = 1. It will be seen shortly that a similar remark 
applies to the functions /n(s) and certain functions to be mtroduced later. 

In the remainder of this section we shall summarize certain results, most of 
them are contained imphcitly or explicitly in works by Fisher [1], Lotka [ 2 ], 
Steffensen [3], Ulam and Hawkins [4], Kolmogoroff [5], Kolmogoroff and Dmitriev 
[ 6 ], and Yaglom [7]; some of these references are not widely available. 

From our definition, P(zn+i = k\zn = j) is the coefficient of s’" in [f(s)]’. 

oo 

Hence pn+i,A is the coefficient of s'" in XI Pny[/(s)]^ whence 

7=0 

(2.1) U+l{s) = fnUis)] 

Letting n = 1, 2, • ■ • , successively, it follows that the generating function of Zn 
IS the nth. functional iterate of f{s) Hence 

( 2 . 2 ) /„+i(s) = /[/„(s)] 

We note that /n(l)’^ Ezn , /n'(l) + /n(l) - [/n(l)]' = Var(z„) Differentiation 
of ( 2 . 1 ) at s = 1 gives/„Vi(l) = another differentiation gives/n+i(l) = 
/"(I)[<(!)]" + /'(!)/"(!) while twofold differentiation of ( 2 . 2 ) gives/"+i(l) = 
/"(l)f„'(l) + [/'( 1 )]V"( 1 ); these two expressions for /',(+i(l) can be equated and 
solved for /K(l), provided x = /'(I) 1. Thus the mean and variance of z„ are 

2 nr _ -| \ 

givenby Ezn — (Ezi)” = x"; Var (z„) = -^2 _ 3 ; — > ® ^ ~ > 

a: = 1. Higher moments, if they exist, may be found by a similar process. 

Definition 2.3. Denote by a the smallest non-negative real root of the 
equation t.= fit). We see that a; < 1 implies a = 1 while a: > 1 implies 
0 < a < 1, the equality a = 0 holding if and only if po = 0. In no case can the 
half-open interval 0 < f < 1 contain more than one root. It is readily seen that 

( 2 . 3 ) limpno = lim/„( 0 ) = a 



476 


T. E. KAURIS 


We thus have the well known result: the number a is the 'probability of eventim 
extinction of the family. The relation, between a and x shows that the probability 
of extinction is 1 if and only if x < 1. 

It is also clear that 0 < i < 1 implies lim/„(t) = a; this, together with (2.3), 
shows that 

(2.4) hm p„r = 0, ?' = 1, 2, ■ • . 

n -»ec 

Relation (2.4) means roughly that the family either dies out or get.s very large 
In section 4 it will be shown that (2.4) holds uniformly in r. 

Definition 2.4. The random variables w, are defined by w,, = z„/.r’‘. 

Clearly = 1 and Evil. = 1 + it x ^ 1. 

Suppose n > m. Then E{znZm) = 2 PmrEirZn | = r) = X) = 

r r 

x’‘~™Ezl . Thus E(wnW,n) = Ewl , whence 

(2.5) Eiwn WmY = Ewl — Evil, n > m. 

By virtue of (2.5) we obtain 

Theorem 2.1. If x > 1, the random variables w„ converge in mean square, 
hence in probability, to a random variable w. 

<T^ 

For in this case Ewl —i 1 + 2 -as n —> » and (2.6) shows that 

X — X 

E{Wn — w,nf —> 0 as n and m —> oo. Theorem 2.1 is then a consequence of [8], 
p. 38, I 

It is well known that convergence in mean square implies Ewl Ew^ and 
E(w„ — if E{w ~ 1)^ whence Ewn —>■ Ew. 

Thus we have 

2 

(2.6) Ew = 1, Ew" =- \+ — . 

.V - X 

In order to study the behavior of z„ for large n when x > 1, we consider the 
distribution of w. 

Definitions 2 5, G„{u') = P{w„ < w);^„(s) = £^(6“"') = f e‘" dGn{u). 

Definitions 2.6. (Applicable when x > 1.) G{u) = P(vi < «); 0(s) = 

E{e™) = / dG{u). We shall refer to 0{u) as the asymptotic distribution 
Jo- 

branching from f{s). 

The moment-generating functions (m.g f.'s) <l>n{s) and </>(s) are defined at lea.st 
for Re (s) < 0 Unless specifically stated otherwise we shall consider them only 
in that domain. • 

From (2.2) and the fact that <^.,(s) = it follows that 4>n+i(.sv) = /[0n(s)]- 

Theorem 2.1 implies that if .t > 1 G„{u) —> G(u) and 0n(s) ^ </>(s) for Re (s) < 0, 
Thus themgf. (^(s) satisfies the functioned equation 

(2.7) 4‘{sx) r= Mis)], Re (s) < 0. 



BKANCHING PEOGESSES 


477 


Equation (2.7), which, of course is applicable only when > 1, was obtained in a 
diffeicnt form by XJlam and Hawlcins It belongs to a type usually Icnown as 
Koenigs’ equation, after the nineteenth century mathematician who studied it 
in connection with functional iteration, and is related to an equation studied by 
Abel. 'We shall make some use of the work of Koenigs later. See Hadamard [9] 
and Koenigs [10]. 

We note that EiJ' < ^ li and only if Ez\ < oo. It ivas already pointed out 
that Ew = 1. As pointed out in [4], as many further moments of w as exist 
may be found by successive differentiation of (2.7) at s = 0. 

Finally we note that (?„(0) = p„o . Hence lim (?„(0) = a Thus G{0) = 
P(io = 0) > a. We show later that 0(0) = a Clearly G(u) = 0 for w < 0. 

In sections 3 and 4 we always assume .i: > 1. 


3. Asymptotic properties of the moment-generating function. We first 
shoAV that (2.7) uniquely determines the distribution of w. Specifically, 
Theobem 3.1. Let Gi(u) and Gi(v) be distributions with equal first moments 
and finite second moments whose characteristic functions <i>\(it) and 4>i{it) satisfy 
(I is real) 4>r(itx) - /[<^f(fi)], r = 1 , 2 . Then Gi(u) = Giiu). 

From [13], p. 27, thiiit) — t/> 2 (it) = i^/3(i), where /3(0 is bounded as i —> 0. 


From (2.7), | 0i(ite) - Mitx) | = |/[<^i(ft)] - flMiOl I ^ ^ \<t>i(il) 


since | /'(s) | < a: when | s | < 1. Hence for i 0, 




> X I /3(0 


<i>s(it) I, 
Thus 


l3(t) cannot be bounded near t = 0 unless it is identically zero, hence 


4>i(it) = (hiiit). 

It IS clear that the requirement that^(s) have the form 1 -|- s + O(s^) between 
two rays from the origin is sufficient for the umqueness m that domain of solu¬ 
tions of (2.7). On the other hand, continuous solutions can be constructed at 
will if the existence of a derivative near s = 0 is not required. 

Before proceeding further, it is convenient to define three functions k(s), 
^(s), and H(u) which are closely related to/(s), <t>{s), and G{u) respectively. We 
repeat that we are considering only the case rr > 1. See definition 2 3 for a. 

Definitions 3.1. Let/c(s) = -- • Clearly k(s) is a proba¬ 

bility generating function with /c(0) = 0, k'il) = /'(I) = X, k''(l) < oo. We 

oO 

write k(s) = QrSr. We also define the iterates fc„(s) by 

r=l 

/Co(s) ~ S, /c,i.|.l(s) = fi/[/Cn(5)]. 

Definitions 3.2. Let H (u) be the asymptotic distribution branching from fc(«) 
(See Definition 2.6.) Let ^(s) be the corresponding moment-generating func¬ 
tion We know then that i/'(s) and k(s) satisfy 


4^(sx) = k[i(s)] 


(3.1) 



478 


T. E. HAHBIS 


In view of the uniqueness theorem we have, by direct substitution in (3.1) that 
>/'(s) must be given by ^ ’ 

(3.2) i(s) = - a 

1 — a ’ 


and that B'(u) must be given by 


(3.3) 1100 = 


G- 


1 — a 


n >0; HOO = C). u < (). 


We shall see later that 17(0) = 0; i.e., that G(0) = a. Therefore H{u) is the 
condihonal distribution of (1 - a)w, given that to 9 ^ 0 Another way of statine 
this is as follows: ® 

Theorem 3 2 . The random variable w is distt ibuicd as the product of two inde¬ 
pendent random variables Wq-w', where io„ takes the values 0 and with prob- 

aUlities aandl- a respectively while w' has the asymptotic distribution branchina 
from his), ^ 

For it is directly verifiable that ^(a) is the m.g.f of w^w’. 

In theorems 3.3 and 3.4 we consider the behavior of ^(s) for large | s 1. To 
make for smoother reading we defer the proofs till section 9, where somewhat 
more general formulations arc given. In section 4 the properties of i^s) are 
interpreted in terms of (?(w). 

Definitiox 3 3. Lety = logi^-^ = logx . (See definitions 2 3 and 

3 L) If 3 i = 0 (i.e., po = pi = 0) we takey = co. 

Theorem 3.3. Suppose y < 00 . Then if Re (a) < 0 and s pi 0, 

(3 4) xkis) = ® + Mois). 

Mis) is continuous for s pi 0; Mis) and Mois) satisfy respectively 

(3.5) Misx) = Mis), Mois) = 0 (|-^| 2 y) > | a | -> 00 

Remarks. (See section 9 for proof.) (a). Under the conditions of the theorem 
Mis) is real and positive when a is real and negative. (5) If Ez[ < « and the 
conditions of the theorem hold, the rth derivative of i^(s) satisfies 

(c) If y = oj, i/<(s) and as many derivatives as exist approach 0 exponentially 
as I s I —» 00 . 

We now eonsider the behavior of ^(s) on the positive real axis, provided it is 
defined there. 



BRANCHING PROCESSES 


479 


Lemma 3 I Lei f(s) be analytic in tlie circle | s | < a, a > 1 Then 0 (s) and 
^(s) are analytic in some neighborhood o/s = 0. 

We use a theorem of Poincar6 [11] -which insures that there is exactly one 
function 0(s) analytic near s = 0 with ^(0) = ^'(0) = 1 and satisfying 

^(sx) =/[^(s)]. 

(Although Poincar6’s proof is for the case /(s) rational, it applies equally well 
here ) The circle of convergence of the MacLaurin senes for ^(s) has radius 
where = a. An argument whose details are given in [12], p. 21, then 
shows that t^{s) = ^(s) for | s | < , and Lemma 3.1 follows. (The argument 

IS necessary to rule out the possibility that the <^„(s) converge to ^(s) for 
Ee (s) < 0 but to some other function for Re (s) > 0 ) Clearly ii>{s) and V'(s) 
are entire if and only if /(s) is entire. 

Lemma 3.1 is useful for actual computation of G{u). The (non-negative) 
coefScients c, in the series 4>{s) = 1 s -f- cjs^ + • ■ ■ can be determined by 
differentiating (2.7) at s = 0. The series can be used to compute values of the 
characteristic function <i>{it) on some interval to < t < kx, where fo is a small real 
number; the values of <t){it) for the remaming values of t are determined by (2.7). 
(Note that the real and imaginary parts of <i>iit) are respectively even and odd ) 
Then the usual inversion formula is used to obtain ff(u). A numerical example 
of this procedure is worked out in section 5 

Definition 3 4. The number p is defined by p = logidif/(s) is a polynomial 
of degree d, p = oo otherwise. 

Theorem 3 4. Let f(s) (and hence /c(s)) he a polynomial of degree d Then 
for s > 0 

. LU) + w.), 


L(s) IS continuous and positive, L{s) and Lo(s) satisfy respectively 


L{sx) = L(s), 


as) - 0 (1) 


S —^ CO . 


The proof is in section 9 (Theorem 3.4 may be compared with a more widely 
applicable but less precise result due to Shah [19].) 

Corollary. If f(s) is a polynomial of degree d, ^(s) is an entire function of 
order p and type C where C = Max L(s), 1 < s < a; 

An explicit determination for C has not been found An approximate numeri¬ 


cal determination is not difficult, the function L(s) = lim 


log A!„[^(s)] 
sf d" 


can be 


determined numerically for a number of values on some convenient interval 
So < s < SoX, and the maximum value approximated The importance of C 
will be indicated in the conjecture following Theorem 4 3. We may also men¬ 
tion that the quantity [Max L(s) - Min L(s)], 1 < s < a:, is of some interest 
Some numerical work indicates that in certain cases L(s) is at least approxi¬ 
mately constant 



480 


T. K. irAltltlS 


4. Some properties of (t(u). Since iL will bo convenient to work with H{%i) 
rather than G{u), wc stale the content of I’heorems 4.1, 4.2, and 4 3 in terms 

of G(u): G{u) = a + g{t)) da for w > 0 The demsity (j{u) m continuous for 

•/Q 

u 9^ 0. If Ezi < CO then {/‘’’(a) is continuous foi n 0 provided r < 7 + /c - l 
and IS contmiioiiH foi u = 0 provided r < 7 - 1. Near u = 0, G{u), provided 
7 < “, approximatos, in a certain mean sense made clear by Theorem 4.2 the 

/ j _ q) ^ ^ ^ 

function a + iNilf[ii(I — a)], Avhcrc for convenience wc have defined 

M(u) for positive u by M(u) = M(-u) It is then shown that in a certain sense 
g(u) goes to zero faster than exp and slower than exp (-w®'*'') where e is 

any positive number, Q being defined in Theorem 4.3. A conjecture ' ^wen of a 
more precise result, applicable when/(s) is a polynomial: in the same sense ff(u) 
goes to zero (more, less) rapidly than (exp [—(A* — t)u°], exp [- (A'*' -j- e)n®]), 
where A* is defined in the conjecture. 

Definition 4.1. LGt7/'(u) = h{u). 

Theohem 4.1 II(u) is absolutely continuous Theorem 3 3 shows that H{u) 
IS continuous, see [13], p. 25. This incidentally shows that (r(0) = a. If 
7 > i the absolute continuity of H{u) follows from the Plancherel theorem 
See any text on Fourier transforms. In any case, define the functions 


I 

'■(w) = -^ j e '"‘^{li) dl, m = I, 2 , 


An integration by parts'* gives for u 9 ^ 0 


(4.1) h„{u) = --i [^{im)e~ 

2tiu 


- ii-imWn + ~ r dt. 

^TTZH V—771 0/t 


If 0 < Ui < u < Ui, (4.1), (3.4), and (3.6) show that the continuous functions 
hmiu) converge uniformly in [wi, Wj] to a continuous function h(_u). Moreover 


(4.2) 


H{ui) — H{ui) = lim [ 

m—*oo 


— 2irit 


m —* 00 w 

«2 


^(iO dt 


r"2 r'^3 

= bm I haM du = / h(u) du, 

ff I —* «0 »* It I V m 


the first equality in (4 2) following fiom [13], p 28 and the second from the fact 
that the hm(u) are uniformly bounded for ui < u < Ui In case Ezl < w 
and r < 7 + /c — 1 , repeated integration by parts of (4 1) and reference to 
remark (b). Theorem (3.3), shows that the first r derivatives of ^(u) are con¬ 
tinuous if u 9 ^ 0 The usual integral expression for h(u) in terms of i/'(fi) shows 
that 7 > r + 1 implies h^’'’{u) is continuous at 0 . 


“ I am indebted to J, W Tukoy for this suggeation, which simplifies the original proof 



BRANCHING PROCESSES 


481 


CoroijLary to the continuity op H{y.i) the numbers p^r = P( 2 „ = r) —>0 
uniformly in r,r > 1, as n —> «. We have 




-G ^ 


+ 


G 


© 




+ 


G 


r 

. V ' 


— Gn 


r - 1 

X"- 


The desired result follows because Gn(u) G(u) uniformly for w > 0 and because 
G{u) must be uniformly continuous for 0 ^ m < w (right-continuity at 0) 
We next consider the behavior of H{u) near u = 0, when t < oo. Theorem 
3 3 suggests what sort of result may be expected. If the function ilf(s) of 
TheoreBm <,^3 were a constant il^T it wonld follow iiom a Taubenan theorem due 


to Karamata (see [14], pp 189-192) that Hiu) 
Hiu) ‘"’“m 




as u O-b, or 


Rr(7 + 1) 

give 


r(7 + 1) 

Integratmgboth sides of this i elation from u to ux would 


(4 3) 


f 


' H{v) dv 
yr+i 


M dv 


T{-y + 1) 


rM. 

Jl V 


The analogue of (4,3) turns out to be true, as shown by Theorem 4.2, which 

shows that in a certain mean sense, Hhi) behaves like 0 + 

r(7 + 1) 

(We defined M(u) = M{--u) for u > 0 ) 

Theorem 4 2 


Lim 

«-^04- 



H{v) dv 

1 ) 7+1 


. 1 r M{v) dv 
r(7 + 1) V 


The proof, which follows directly along the lines of the proof of Karamata’s 
theorem, is sketched briefly m section 9, for a somewhat more general situation 
A corollary of Theorem 4.2 is that if 7 < 1 , h{u) cannot be bounded as u —> 0+ , 
for hiu) < K implies 


hm 


r” K ■ vdv 

L 7''+l 


> 


(t + 1) 


/: 


V 


or 


hm u~'' > 0, 


which implies 7 > 1. An example to be given m section 5 shows that if 7 = 1 , 
hiu) is at least m certain cases bounded but discontinuous at 0 

In order to consider the behavior of Hiu) a,s u -+ co we first prove a theorem 
which apphes to any distribution whose m g.f, is an entire function. 

Theorem 4.3 ® Let Fiu) be any ed.f whose m,gf ^(s) is entire. Let pbethe 
order of ^(s) Let Q he defined by 

Q = l.u.b. q: f e'“' dFiu) < 


^ Before completing the present proof, the writer communicated this result to R P.Boas, 
Jr , who sent back a proof along different lines 



482 


Then- +4 = 1. 
p Q 


T, K, IIAllRIS 


The proof is given in section 9. 

Combining Theorems 3.4 and 4.3, we obtain immediately 

Tiieokem 4,4. Lei Q = lu.b. q : / c''’7j(w) du < os. Then Q = , 

Ja P — 1 

Here p is given by definition 3.4. If /(s) is not a polynomial, whether entire 
or not, the proof of theorem 4.3 will show that Q = 1, and we interpret theorem 
4.4 in that sense. The trivial case /(s) = is excluded, so p > 1. 

Conjecture. Let f(s) oj theorem 4.3 be of finite order p and of type C, 

• 60 

0 < C < 00 . Let Q = ——— and let A = l.u.b. A'\ / 

Them [Cpf {Aqf = !• 

The proof for the case p rational follows the same lines as the proof of Theorem 
4.3; a general proof has not been found. If the conjecture is true then having 
determined p and Q, when /jfs) is a polynomial, and having estimated C by the 
procedure indicated following the corollary to theorem 3.4, we obtain 


r* 

for the l.u.b of the numbers +' such that / du < 

Jo 


responding number A* which applies to ff(u) is given by 
(4.5) yl* = A(1 - a^. 


00 


The cor- 


6, Some special cases. In this section we shall discuss some special cases in 
which the m.g.f. <f>(s) and the c.df. G(u) may be determined explicitly. For 
these cases and for certain others there is a close relationship between the simple 
discrete branching process and another t 3 T)e of model to be discussed in section 8. 
Finally a numerical computation of the distribution G(u) will be given for a 
particular case where f(s) is a second degree polynomial 
Suppose/(s) has the form 

1 

1 + a — as 

with a: > 1, a > — 1, where /'(I) = x and /"(I) + f'(L) = Lzl = aifl + 2a). 

It IS easily verified (as pointed out by Poincar6 in [11]), that the solution of the 

equation <7)(sa;) =/[(7i(s)]isgivcnby(7>(s) = 1 + '- —z -with(^(0) = 4 >'iQ) = 1 

2/ ■“ i CtS 

The number a satisfying a = JXa) is given by o = — —- , The functions 

a 

1 s 

\p{s) and k{s) of section 4 are given by f(s) = , fc(s) = ^ • 


/(s) 


=!--+-( 
a a \ 



BRANCHING TROCESSES 


483 


The number 7 of Theorem 3 3 is 1, The density function h{u) (definition 4 1) 
is simply as seen by direct calculation. The number Q of Theorem 4 3 is 1 , 
as it should be, since/(s) is not an entire function. The c.d.f H{u) is 1 — e~“, 
and H{u) ~ u near u = 0, m agieement with Theorem 4,2 Various aspects 
As+ B 

of the case /(s) = discussed by numeious authors 

Someivhat more generally, we may consider generating functions of the form 


( 51 .) 


k(s) = s[,r — {x — l)s”‘] 


mi—l/m 


.r > 1 


The function k(s) is a generating function if and only if m is a non-negative 
integer In this case we have </)(s) = ^(s) = (1 — and g(u) == h(u) = 


(l/m)-l -(ll/m) 

cl C/ t 


Here 7 = - , and we note that unless m = 1 the 
m 


(s) 

density function li('it) is unbounded near u = 0. A physical interpretation for 
this case will be given in section 8 

As a numerical illustration we consider the case /(s) = 0.4s -f- 0 6s We 
have X == Egi= 16 and <r^ = E(zi - xf = 0.24. For the asymptotic distribu¬ 
tion, Ew = 1. E(w - 1)'^ = = 0 25. The number 7 = logie = 

1.9495 so that 'Pis) which is identical with tpis) in this case, is as 

I s I goes to 00 with Re (s) < 0 This implies that the c.d.f H(ii) and likewise 
(?(«), since the two are equal here, behaves like [l/r(l + y)]Miii.) times 
near u = 0, where the “behavior” is in the sense of Theorem 4 2. Numerical 
determination of M (u) would not be difficult. The number p of Theorem 4 4 
IS given by logi 2 = 1 4748. This means that ^(s) is an entire function of order 
1.4748 and hence that the density function h{u) goes to zero more rapidly than 


-u0-‘ 


e and less rapidly than e “ * for any e > 0, where Q — ^ ^ — 3.1061, 

and “more rapidly” is used in the sense of Theorem 4.4. 

The function L(s) = lim was computed for four values of s between 

s = 1 and s = a: = 1.6, in each case the value was 0.744625 so that it appears 

likely that here L(s) is constant Hence C = Max Lis) = 0.744626 and the 

quantity A defined by (4.4) is 0.26430. Thus the conjecture following theorem 

4,4 indicates that [ giu)e^° 7i462s±nu"»«i (divergent, convergent) accord- 

Jo 

ing as the -f or — sign holds. 

Through the kindness of Mr. Cecil Hastings of the Douglas Aircraft Company, 
the c d.f. Oiu) was computed for this case The coefficients in the power series 
expansion of (f>(s) were obtained from the functional equation (3 1) and G(u) 
was then obtained by inverting 4>iit). The values of G(u) are given in Table I, 



484 


T. K IIAIIIIIS 


6. Number of generations to extinction. It was pointed out in section 2 that 
when . 1 ; < 1 the probability is 1 that 2 „ = 0 foi some integer n We assume 
Ihrough-oid section 0 that x < 1, 


TABLE I 

G(u), the limiting probability that zjx' < u for the case f(s) = 0.4s + 0,6s“ 


V 

d(n) 

0.00 

.00000 

0 25 

.04753 

0.50 

.17275 

0.75 

.34550 

1.00 

.53117 

1.25 

.69932 

1 50 

8,3042 

1.75 

.918.57 

2.00 

96781 

2,50 

99761 

3.00 

.99993 


Definitions 6,1 Let the random variable N be the smallest integer n such 
that 0 „+i = 0. Define the moment-generating function of N by 

e(s) = Z e”‘P(N = n) 


Clearly P(N = n) = p„+j,o - p„o, so that 0(s) = Z e^Cp^+i,, - p„o). 

Definitions 6.2, Let bn = 1 Pn+i,o, with bo = 1 — po The numbers b,i 
satisfy the recursive relation 

(6-1) = 1 - /(I - bn). 

Define the function 8 i{s) by 

Si{s) = Z bnc”’. 

7(B>0 

We see that 

(6'2) e(s) = 1 + (e- - i)e,(s), 

so that it suffices to determine the function 9 i{s). 

Ihe function 0 i{s) belongs to a type which has been studied by Fatou [15] 
and Latths [16] If we let e‘ = 2 we see that 6 i(z) is a power series whose coeffi¬ 
cients are successive iterates of the function/*=(5) = 1 - /(I - 5); i e , 5n+i - 
f*{K) = /n+i(5o), where /*(0) = 0, ^ x < 1. It was shown by Fatou 



BIlANCHING mOCESaES 


485 


—n log X, n ~ 


WHS obtained by Lattcis, the expansion converging everywhere except at the 
poles The quantities /i, and j/o are defined as follov'S the function m(s) = 
fiis 'h + MjS® + is determined by the functional equation a(s-x) = /*[m(s)] 

with the condition 7t'(l) = in = 1 The number 7/0 is determined by ^(l/o) = 
1)0 = 1 — Po Perhaps the easiest way to determine ?/o is to use the fact that 
the inverse function ir'(s) satisfies the functional equation (s)j 

fiom whieli we can determine the power series foi n~ (bo) 

Since the use of Lafctfis’ expansion leqiiires finding the expansions of /i(s) and 
we now give another method, giving a different kind of expansion, this 
method appears particularly adapted to the case here illustrated, where f(s) 
is of the second degree Then (G 1) becomes 

(6.3) fin+i = xb„ - pibl, bo = 1 - Pd 

Definition 6 3 The functions Oi,(s), = 1, 2, ■ • • , are given bj^ 

(6 4) Si(s) = ^ (&7.)V‘. 

n*0 

If we raise both sides of (6.3) to the kth power, multiply both sides by c"', sum 
on n from 0 to “, and solve for di(s), we obtain 


(G,5) S/Xs) = 


fioc" + t (J) (-P2)^x^-%M 

g-. _ -ji. 


that a function of this soit is meromorphic with poles at s = 
f, 2, • ■ An expansion for di(s) m the form 


0,(.) = + "MU - 

i — 17* 1 — X%* 


+ T- - V + 

1 — 


(Justification for the learrangement of series will come out of the subsequent 
proof) If we put k = 1 m (6.5) we obtain 


( 0 . 6 ) 


„ , \ hoe ’ - Pidiis) 

— 


Definitions 6.4. We define reciusively sequences of functions S„(s) and Rn{s), 
such that for each n, 0i(s) = iS„(s) + I2„(s). Let 



Bi{s) = 


P2 0a(s) 

— X 


Suppose now that Rn(s) is of the form A„i0,.+i(s) + + A„„e 2 „(s), the 

being functions of s, po , and x, but not explicitly of bo , while Sn(s) is a rationa 
function of e-», po , and x, and a polynomial of degree n in bo . Now pii^t 
it = n + 1 in (6 5) and substitute the expression obtained for e„+i(s) into Kn(s) 
Collecting terms we now define Rn+i{s) as the sum of terms involving 0 ,i 42 (s), • 
eu+ 2 (s): Bn+iis) = A„+i,i 0 ,.+ 2 (s) + • + A„+i,„+i02.+2(s); then »S„+i(s) 




48G 


T. i: HABRIS 


Oi{s) — R„+i{s) is a rational function of e~‘, pi, and x, and a polynomial of 
degree n + 1 in 6 o. 

Theoeem 6 . 1 . Lei f{s) = po + PiS + P 2 S^ with x < 1. Suppose that 
X + piho < 1. Then the junctions Sn{s) converge to 0 i(s) in aneighborhood o/s = 0 
The restriction x + piba < 1 may fail to hold. However this is not a serious 
lestriction; wc pick a value of n so that x + pjin < 1. Then 

01 (s) = i)o + • • ■ + b„-xe^” + e’'‘6t(s), 

cO 

where 0 *(fi) = 2 is the same type of function as 0 i(s); tlieoiem 6 1 

is then applicable to 0 i (s). 

If the conditions of theorem 6.1 are satisfied, we have 
, ^ 0i(s) = &ae~'[ 7 ri(s, x) — Pibo-ir 2 (s, x) + 2 .T:p 2 62 ir 3 (s, x) 

(6.7) , , 

— P 2 &o(e ’ + bx^Wtis, r) + • ■ i 

where 714 ( 5 , a:) = H -'r'). Since E{N) ~ S'{0) = 0 i(O) and E{N“) = 

r-l \c ~ X j 

d"(0) = 2 o(( 0 } + 01 ( 0 ), we have 

EiN) = ho[.Ti((), x) - pibo Ti(0, x) -f 2xplblirs{0, x) 

— pi bo (1 + 5 ®’) 7 r 4 ( 0 , x) + • • •], 

E(N-) = -E(iV) + 2ho[ir;(0, x) - p^boirliO, x) 

+ 2 .rpjfiS 713 ( 0 , x) — (5.r;’ + l)p 2 ho’ri( 0 , x) 

+ P 2 &o'Jr'i( 0 , '0 + ■' •] 

* 1 

where 714 ( 0 , x) = n(0, x) S q— -- . 

r_i 1 — a:’' 

We now prove that if x + P 2&0 < 1, the expansion (6.7) is valid in some neigh¬ 
borhood of 5 = 0 We shall denote the particular values of x, p ^, and 60 with 
which we are dealing by x, p 2 , and 60 . Now let x, pz, and bo be three complex 
numbers, arbitrary except for the following restrictions: 

( 6 . 8 ) I a: 1 + I P 2 1 < 1, 1 60 1 < 1 

and define the numbers b„ in terms of bo , x, and pt, by means of (6.3), with 
04 (s) defined by (6.4). 

We first show that (6.7) is valid if ((i 8 ) holds, and then show that the domain 
of validity also includes the original numbers x, ps, and 60 , provided 

5 + P 260 < 1 . 

If ( 6 . 8 ) is satisfied, we have | &» | < .4. | a: |" where .4 is a positive constant. 
Now suppose 1 < T < . Then the series defining 04 (s), = 1 , 2, • ’ • , are 



BBANCHING PROCESSES 


487 


uniformly and absolutely convergent in the domain | e" | < T. Moreover, if 
1 X 1 + 1 pi 1 = A < 1, we have 1 bn 1 < hoA" whence, if Jc is an integer large 
enough so that Ta’° < |, 

(6.9) l9;b(s)l < 21)S 

for I I < T. In what follows, we assume \e‘ \ < T. Now write di{s) = 

n 

Snis) + £^nj(p 2 , X, s)0n+j(s), whei’c u IS large enough so that TA" < | 

j-i 

Let Anipi , X, s) = Max | , x, s) | . Passing to the next stage we see 

IA 1 1 ( \ 

that < A, + A"-^' < 1 + —3 Hence the numbers 

e—X \e—X/ 

An are bounded. This fact, together with (6 9), shows that lim f2„(s) = 0. 

Now suppose that x and 6o have their original values x and 6o while pi is small 
enough m absolute value so that x + | | < 1. In tins case lim Sn(s) = 6i(s) 

We observe that S„(s) is a polynomial of degree n — 1 in pa and that Sn+i(s) is 
obtained from Sn(s) by adding a single term of degree n in p 2 • Thus 0i(s) has 
been expressed as a power series in pa • Now consider 6i(s) as a function of pa, 
with ho = 5o 1 X = .X. If X "b 6o 1 ps 1 1) we have bn ~ 0[(®) ] Thus 6x{s'} 

is analytic in pa for | p 2 1 < and the expansion in (6 7), being a power 

Oo 

series in pa | must be valid when x + pjbo < 1. 


7. Estimation of parameters. Until now we have assumed that the param¬ 
eters pr are known numbers. We may wish, however, to estimate them, haimg 
observed the numbers 2i, ^ 2 , • • • > ^n+i ■ In order to get simple maximum like¬ 
lihood estimates for the pr, it appears necessary to introduce certain auxiliary 
random variables 

Definitions 7 1. Let Zmk be the number of individuals in the mth generation 
who have exactly k descendents in the (m + l)st generation. Let = 

Theorem7.1. Maximumlikelihood estimates of pr and x, based on observed values 
of Zmk for m < n, are respectively, 


P. 


= X^{Zn+l-l)/Zn. 


mrniQ 


(Note that the estimate x involves only zi, - • • , Zn+i •) 

If Zm is fixed the joint conditional probability function of z»no, zm > • • • > is 

j n (^wt) '• Thus the joint probability function of the for 

- w, X, - , n, and ?• = 0, 1, 2, • • • , is given by the product of two factors, 

one of which is independent of the p,, the logarithm of the other being L (2 z-nr) 


71=0 



488 


1'. i; HAIIKIS 


log p. The value of this cvpression is clearly maximized by = pr 

as given above. Since = Zm and ^rz„r = i, the quantity ^rpr gives 

r r 

X as above 

Although the estimates pr are the same as we would obtain if we were dealing 
with Z„ trials from a multinomial distribution with probabilities p,, the joint 

n 

distribution of the quantities Zmr, r = 0, 1, , i.s not multinomial. For 

fn-aO 

example, i' Zn > 1 the probability of the event 

ZmO = Zn , Zrnr = 0 fOl’ r o| IS 0. 

IrriBO MmO J 


We shall next show that the estimate x is, in a certain sense, consistent. 
Theorem 7.2. If x > 1, the random variables ZnWZn converge zn prohahility 

to the random vanahle xV* where V* = ^zf w = 0 anti F* = lifw^ 0. 

liw ^ Q then for all«, 2 „ 0 and l/Z„ — 1 0 as n —> ». Hence in this case 

(Zn+i — l)/Zn converges to x if Zn+ilZn does On the other hand, P(w = 0) 
= a = P(zn = 0) for .some n, so that if w = 0, Zn+i/Z„ = 1 with probability 1 for 
n large enough. Thus we need only show that Z„+i/Z„ converges to r if a: > 1 
and w 7^ 0. 

We need the following: 

Lemma 71. If z > 1, the random variables ZJx" converge m p-obaUliiy to 
wx 

X — I' 

Since 


(7.1) 


WX /On _ W 

X — 1 X" 




(w — Wr) 


( a; \2 1 

T ) 2n+2 Eiw^) = 0 andlim 

iC 1/ 3 / n^GO 

m V ^^ ) = 0, The truth of the first statement is obvious, since Ew^ 
\r~f} aj” j 

is finite. It follows fiom (2.5) that E{w,w,) = Ewlii s > r, E(wwr) = lim 

(T^ 

E(wnWt) = Ew\, whence E{v} — Wtf = ^^2 _ and E[(w — Wr){iv — w,)] = 


(z^ — x)x 


if s > r Then 


x"-’ j z^z^-x J 


and this quantity clearly approaches 0 as n —1 «), proving Lemma 7.1. 



BRANCHING PROCESSES 


489 


Define the random variables w* and 

7„ as 


= w 

when 

w 9 ^ 0 

w* = 1 

when 

w = 0 

II 

when 

Sn ^ 0 

y - ^ 

when 

3n = 0. 


It IS clear that the V « converge in probability to w* -r, and we note that the 

a: — 1 

c.d.f. of w* is continuous at w* = 0. Hence, 

7n+l 


lim P 

7l-*00 


(~-l > e > o) = lim - 7„ ± 7„£ ^ 0) 

\ r n y n—*oo 


= p(^ 


V' 




It follows, under the conditional hypothesis w 9 ^ 0, that the variates con- 
verge in probability to x, since 

= X when 2 ;„+i 7 ^ 0. 

y n 


8. Continuous models. As mentioned in section 1 there are situations where 
it IS more important to consider the number of individuals existing at a given 
time than the number in a given generation. Let a set of probabilities pr be 
given. The question arises whether we can interpret these as probabilities that 
an individual will have a given number of descendents at the end of some fixed 
period of time We might then suppose that each individual in existence at 
that time has the same probabilities of havmg a given number of descendents at 
the end of the next (equal) length of time, these probabilities being independent 
of the age of the individual, A model of this sort might be considered in certain 
fission processes, if the probability of fission is independent of age It should 
be noted that the “descendents” of an individual may include the individual. 
Tor example, if a bacterium splits in two we may either regard it as having pro¬ 
duced two descendents and dying, or as having produced one descendent and 
itself surviving 

If an interpretation of this sort is to be satisfactory, interpolation in time must 
be possible In other words there should exist a family of functions /„(s) defined 
for all positive n such that /ni[/n 2 {s)] = Ui+ni{s); such that for each positive n, 

CO 

/n(s) is a probability generating function, fn{s) = and such that for 



490 


T. E HAKRIS 


w = 0, 1, 2, ■ the functions /„(s) coincide with the iterates s, /(s), /[/(s)], • ■ . 

We may then interpret /„(s) as the generating function at time n. It is readily 
seen that in general such a family of functions will not exist. For example, if 
such a family exists we must have/(s) = nth iterate of/i/n(s) for arbitrarily large 
integral ?i, so that/(s) cannot be a polynomial of degree > 2 
The functional equation shows that/(s) = 0[a;0~'(s)], whence 

/n(s) = for integral The expression (l)[x''<tr''(s)] then might be 

taken as the definition of /„(s) for all positive n. See Hadamard, [9], The prob¬ 
lem of determining whether the functions so defined are a family of generating 
functions will be discussed in a subsequent paper. ■ We remark, however, that 


if /(s) has the form- z -^ considered in section 5 then the iterates /n(s) 

UCf ^ 

5 

have the form j- they are clearly generating functions for all posi¬ 

tive n, satisfying the required relation /„,(/«,) = /m + m ■ Now suppose g{s) 
is some function such that the function/(s) = g~^ ^ generat¬ 

ing function for all a: > l,with(/(l) = 1. As pointed out by Ulam and Hawkins, 
the iterates of functions /(s) of this form are convenient to work with, the nth 
-if a is) 1 

iterate being simply (7 I J J • Iii£>'<iditi on, the requirement that 

/(s) be a generating function for all x > I shows that the functions /„(s) are 
generating functions for all n > 0. The simplest function gis) which satisfies our 
requirements is gis] = s”, where m is any positive integer. In this case /(s) 
has the form considered in (5.1) and/n(s) = sfe" — ix" — As n —> 0 

have/n(s) = f 1 — log x |s + ^ ^ s”*'*'* + 0(n“). We may interpret this 

\ m J m 

as follows. A particle in existence at a given time may, in a short time interval 
At, either split into m + 1 particles, with probability or it may remain 

7Tl 


we 


unaltered, with probability 1 — jf splits each particle produced 

m 

has the same chances for splitting as its parent, etc. Thus, from the results of 
section 5, it follows that if we begin with a single particle at time t = 0, the 
asymptotic probability density function for Zi/x\ where Zt is the number of 

particles at time t, is given by 

It IS, of course, customary to begin with the elementary probabilities for a 
certain number of births in a short time At and determine the functions /„(s) 
from these by means of differential equations. See, for example, Arley, [17]. 
The results of the present paper can be applied in some cases to the continuous 
problem even when an explicit determination of the/„(s) is difficult. A discus¬ 
sion will be given in a later paper. 




BEANCniNG PBOCESSES 


491 


9. Some proofs. We give in this section proofs for (A) theorem 3.3, (B) 
theorem 3 4, (C) theorem 4.2, and (D) theorem 4 3; in certain cases we shall 
indicate slightly more general results. 

(A) We make use of a result of Koenigs, m the form applicable here 

Koenigs’ thEOEem: 7/ | s | < X < 1 and qi ^ 0, then /c„(s) = qiB{s) 

[1 + 0(gr)] where B(s) is analytic for | s | < X and satisfies the functional equation 
B[/c(s)] = giB(s). 

Here, 0(qi) means bounded by Ag" , where A is independent of s. Weremaik 
that5(s) 7 ^ 0 . The proof of Koenigs’theorem follows readily if we write fc„(s) = 


?i 


-'mil 

j=i 


1 + 




gi 


where J(s) = — 
£ 


gi. 


Now let ti be a positive number such that | \f/{s) \ < 1 when 0 < | s | < h and 
Re(s) < 0. (For the rest of this proof we assume Re(s) < 0) Such a number 
exists; on the imaginary axis we have \l/iit) = 1 + — ^E[(w'f]f + o(f) where 

jB[(idO^] having the distribution branching from fc(s), showing that 

I \l^(ii) 1 < 1 if / 5^ 0 and sufhciently small, while if Re(s) < 0 we refer to the 

expression i/'(s) = [ e''dH{u). Let X = Max | fi{s) \ for t\/x < | s | < h . 

If 1 s 1 > <1 let iV(s) be the smallest integer such that j s < k . Then 
m = gr‘’51'^(s/a:'''*’)]ll + 0(gf“’)] =Bm][l + 0(gf‘'’)]. 

Now B(i/'(sx)] = qiB[\pis)]. Let il7(s) = | s |'’'B[^(s)]. Then Af(sx) = il7(s). 


s/ti I, and theorem 3 3 follows. Clearly 
< ii, and hence, by functional continua- 


AIso logx I s/k I < W(s) < 1 + log* 

M(s )/1 s p is continuous for hx < [ s | 
tion, wherever Re(s) < 0 , s 0 . 

Concerning the remarks following Theorem 3.3 we have the following: 

(a) If Esl < °o, r-fold differentiation of ^(sx") = lCn[’A(®)] gives, for | s | > 

h> 0 . 


(9 1) 




where Q,j is a polynomial in 1 1 

when I s I < X; because of analyticity, the same must be true of | kn\s) \ . 
Putn = iV(s) m (9,1), A (s) being the integer defined above. Since/clr’'[i;'(5/x")] = 
0(gi) = 0((1/ I s I’’’)), remark (a) follows. 

(b) B(s) IS clearly > 0 when s > 0, hence M(s) > 0 when s < 0. Since £(0) = 

0, Bis) 9 ^ 0 foi sufficiently small s 0; smce 1 ^( 5 ) —»■ 0 as | s | , Mis) 7 ^ 0 

for I s 1 sufficiently large, since il7(sx) = ilf(s), remark (b) follows. 

(c) If 7 = 05 , i.e., gi = 0, then /c„(s) goes to zero with great rapidity as n ^ , 

if I s 1 < 1. The general line of argument is clear 

(J3) Lei fc(s) be a polynomial of degree d > 1 with real coefficients, kis) — 
■ • + q,js‘‘, with'anon-negative double point, /c(«) = a > 0, and such that 

kis) > s when s > a Let \kis) be any solution of the functional equation fiims) = 



492 


T. E. HARRI.S 


hl'f'is)] which IS continuous for s > 0 and satisfies <p(,s) > a for s > 0-^ here m is any 
number > 1 , Then theorem 3.4 hotds, with x replaced by m. 

It is not difficult to show that if a < 5i < s < Sa, lim k,(s) = oo uniformly in s 

Hence ^ w as .5 . Write R(s) = log M + — Then 

\ Qdj~i / 

log log k„l</^(s)] = (1 - oT") log qd/(d - 1 ) + log ri>(s) + ^ 

d~^E(/c,-il<p(s)]), s being taken large enough so that R(hj^il^(s)]) is continuous^ 
Thus, since the functions are bounded, the functions d~’' log 

converge uniformly, for s sufficiently large, to a continuous function L*(s) satis¬ 
fying L*(ms) = dL*(s). Let L(s) = i ^L*(s), where p = log„. d. Theorem 3 4 
now follows_by an argument similar to that used to conclude theorem 3 , 3 . 

(Note that d~7f(/cy_i[^(s)]) = 0(d~")) 

n-hl 

(C) In order to avoid negative signs we work with the Laplace transform in¬ 
stead of the m.g.f. 


e“'“ dff(u) 

be finite for s > 0, Suppose '^(s) = - as 5 —> , where 0 < 7 < 00 ^ 

JH( 5 ) is continuous and satisfies M(sx) = ilf(s) for s > 0, x being some number 


> 1, Then 


ilim f 

u-t0+ Ju 


r" r(T+l) 


rM(v) 

V 


dv. 


Following the lines of the proof of Karamata’s theorem, we see that for any 

y > 0, f s^“hk(s) ds = D + o(l) ass-^ 00 where H = ds: i.e., • 

4 Jl s ‘ ’ J, 

r _ *50 

dsj e dTI(u) = H + o(l), or replacing s by (w -f l)s, / ds f 

dH{u) = D/(n -{■ 1 )"^ -H o(l) = e 'e ’“s’’^^ ds + o(l). It follows as in 

[f^]j PP- 189~192, that if F(u) is any function of bounded variation in (0, 1) we 
have 


J r*’' r” n 

s^-^ds / ^ / e~‘F(s-‘)s^-^ ds. 

Let F{e *) = e* if 0 < s < 1 and 0 otherwise. Then the theorem follows from 
(9 2). ^ 

(D) rhoorem 4.3 is true if F(m) is any bounded monotone increasing function. 
For simplicity we assume that F(l) = 0; it is readily seen that this causes no 
loss in generality. The proof is given for the case 1 < p < oo ; it will be clear 
that p = 1 implies Q = oo , while if p = oo (or if f (s) is not entire) (3 = 1 
Suppose m and n are positive integers such that m/n < p/(p - 1 ). Then 

(9.3) f exp («-'•) dF{u) = Z I f m«) < » Z !fc + iW' c,^„. 
^ wjliiu *■= 0 »' -li r-=o (rn )! 



BRANCHING PROCESSES 


493 


f(0) 

where ct. = — , , interchange of integration and summation are justified by 


fc' 


n 


the positiveness of all terms involved. Suppose 0 < e < " — ( 1 — 


VI 


for fc 


sufficiently large the inequality is satisfied; see [18], p 253. 

Hence using Stirling’s formula, we see that the last series in (9.3) is dominated 
by a series whose rth term, for r sufficiently large, is controlled by the factor 


m(l—(l/p)+«—(rt/fn)) 


Since 1 — - 4-e— — is negative, the series, and hence the 
p m 


integral, converges. We have thus proved 4- - < !■ 

^ P 

♦B—1 ^ 

Conversely, suppose - > — - . Let f(s) = Ji,(s), where fj.(s) = 

n P — 1 A;=0 T-0 

5 *'+’'"^ fc = 0, 1, ■ ■ • , m — 1. At least one of the functions b(s) must be of order 
p We suppose that fo(s) is, if not the argument would need only slight modi¬ 
fications We have 


(9.4) 



exp (u”^") dF(.u) > n 


OQ 


E 


(rm) I Crrn . 
[(r + l)n]r 


Suppose 0 < € < 1 — From [18], p 253, the inequality c,® > 

p m 

must hold for infinitely many values of r. As in the first half of the proof this 
shows that the aeries and the integral in (9.4) diverge. Thus " + q > 1 and 
the proof is complete 

If p is rational, the conjecture following theorem 4 3 can be proved in a similar 
manner making use of a relation between the class of an entire function and the 
coefficients of its series expansion; see [14], p. 95 


REFERENCES 

[1] R A Fisher, The Genelical Theory of Natural Selection,, Oxford, the Clarenden Press, 

1930 

[2] Alfred J Lotka, Theone Analytique des Associations Biologiques, 2, Pans, Herman, 

1930 

[3] J. F. Steffensen, “Deux problemes du calcul des probabilites," Annales de I Inst, 

Henri Pomcari, Vol 3 (1933), p. 331. 

[4] D Hawkins and S Ulam, Theory of MuUiphcative Piocesses, I, Los Alamos Declassi¬ 

fied Document 266, 1944 

[5] A Kolmogoboff, “Zur losung emci biologischen aufgabe", CommiiTucohons for Math 

and Mechanics at Tchebycheff University, Tomsk, Vol 2, part 1 (1938), p 1, see 
Forlschritte der Math , Vol 64 (1938), part 2, p. 1223 t> /n i 

[61 A Kolmogobofp AND N. A Dmitriev, “Branching stochastio proeesBes,” C R (Dok- 

ladv) Acad Sci URSS (NS), Vol 56 (1947), pp 5-8 See Math Rev Vol. 9, 
No 1 (1948). p 46 , , ,. 

[71 A M Yaglom, "Certain limit theorems of the theory of branching random processes, 

Doklady Akad Nauk SSSB (A S ), Vol 56 (1947), pp, 795-798 See Math Rev 
Vol. 9, No 3 (1948), p 149 

[8] A. Kolmogoeopp, Onmdhegnffe dei Wahtscheinhchkeilsicchiuing, Chelsea, 194 . 



494 


T. E. HAEEIS 


[9] J. IIadamabd, “Two works on iteration and related questions,” Am. Math. Soc. Bull , 
Vol.50 (1941),p.67 

[10] G. Koenios, “NouvcIIcb rcclicrches sur lea intcgrales do cortamoa equations fonction- 

jiaWeB," Annales Sci de VEcole Not male Sup dc Pans, Yol t, series 3, suppl. 
(1884), p 3 

[11] H.Poincah^, “Suruneola3aenouvolledetranaGendantesuriiformoa,'’d'ournal deMalh 

Pines el Appliquees, Vol 6, 4tli series (1890), p. 313 

[12] T. E HAHnis, Some Theorems on the Bcnioulkan Multiplicative Process, Dissertation, 

Princeton, 1947. 

[13] II Cram^iii, Random Variables and Probabihly BisLributions, Cainhridge Tract 36,1937 

[14] D. V, WiDDEn, The Laplace Transform, Princeton University Press, 1941. 

[15] P. Fatou, “Sur unc claase remaikablc de series dc Taylor," Annales Sci de VEcole 

Normals Sup, de Pans, Vol. 27, senes 3 (1910), pp 43-53. 

[16] S. LattIis, “Sur los suites reourreiitcs non Imeaircs ct sur les fonctions generatrices dc 

oes suites,” Annales de la Fac. des Sciences de VUniv de Toulouse, Vol. 3, series 3 
(1911), pp. 96-106 

[17] Nibps ARLEr, On the Theory of Slochaslie Processes and Their A pplicalions to the Theoi y 

of Cosmic Radiation, G E.C Gads Porlag, Copenhagen, 1943. 

[18] E. 0. Titchmarsh, The Theory of Functions, 2d Ed., O.vford, 1939. 

[19] S. M. Shah, "On real continuous solutions of algebraic difference equations,” Am 

Math. Soc. Bull., Vol. 53 (1947), pp. 648-568. 



MOST POWERFUL TESTS OF COMPOSITE HYPOTHESES. I. NORMAL 

DISTRIBUTIONS 

By E. L Lehmann and C. Stein 
University of California, Berkeley 

Summary. For testing a composite hypothesis, critical regions are deter¬ 
mined which arc most powerful against a particular alternative at a given level 
of significance. Here a region is said to have level of significance e if the proba¬ 
bility of the region under the hypothesis tested is bounded above by e. These 
problems have been considered by Neyman, Pearson and others, subject to the 
condition that the critical region be similar. In testing the hypothesis specify¬ 
ing the value of the variance of a normal distribution with unknown mean against 
an alternative with larger variance, and m some other problems, the best similar 
region is also most powerful in the sense of this paper. However, in the analo¬ 
gous problem when the variance under the alternative hypothesis is less than 
that under the hypothesis tested, in the case of Student’s hypothesis when the 
level of significance is less than and in some other cases, the best similar region 
IS not most powerful m the sense of this paper. There exist most powerful tests 
which are quite good against certam alternatives in some cases where no proper 
similar region exists. These results mdicate that in some practical cases the 
standard test is not best if the class of alternatives is sufficiently restricted 

1. Introduction. The problem to be discussed in this paper is that of testing 
a composite hypothesis against a simple alternative. More specifically let 7 = 
{/} be a family of probability density functions defined over a Euclidean space 
and let be a probability density function not in 7. We wish to test the hypoth¬ 
esis Ho that the random variable X = {Xi, •• • , Z„) is distributed accoiding 
to a density / of 7 against the alternative Hi that X is distributed according to 
g By a test we mean a region of rejection, ly in E„. 

Neyman and Pearson, in the fundamental paper [1] which laid the groundwork 
of the theory of optimum tests, restricted their considerations to similar regions. 
They considered a region (set) w to be optimum for the given level of significance 
«if it maximizes the power 

(1) f ?(a;) dx 
subject to the restriction 

(2) f fix) dz = e for all / m 7. 

Jlp 

As Neyman, Wald and others have pointed out, it is more natural to replace 
the condition of similarity (2) by the weaker restriction 

(3) f fix) dx < t for all / in 7, 

J to 

495 



496 


E, L. LEHMANN AND C STEIN 


A region w maximizing (1) subject, to (3) is called most powerful against the alter¬ 
native g at the level of significance Here and throughout the paper, all func¬ 
tions and sets are assumed to be Borel measuiablc. 

In the present paper we shall consider certain composite hypotheses, and derive 
tests foi them which are most powerful against a simple alternative, For the 
cases in rvhich these tests coincide with the standard similar regions it will thus 
be established that no fiirtlier increase in power is possible with tests of fixed 
sample sizes In the more usual situation where the most powerful test depends 
strongly on the specific alternative chosen, no such absolute justification of the 
standard test is possible. In these cases, any justification must take account 
of the fact that it is desired to obtain good power against a large class of alterna¬ 
tives. This can be done, for instance, by using Wald’s definition of a most strin¬ 
gent test [2] or his concept of minimizmg the maximum iisk.^ If, on the other 
hand, the class of alternatives is sufficiently restricted, the results of the present 
paper indicate that for small samples there may exist a test which is appreciably 
better than the standard test. 

Frequently the probability of an error of the first kind is an analytic function 
of a nuisance parameter for every choice of critical region. Hence, if it is knoivn 
that some nuisance parameter 6 lies, say in a certain finite interval 1, then any 
test which is similar for 0 in J will be similar for all 6 Consequently, the loiowl- 
edge concerning & cannot be used to find a more powerful test. On the other 
hand, as is indicated at the end of section 5, restrictions of the nuisance parame¬ 
ters may, for small samples, lead to considerably more powerful tests if the con¬ 
dition of similarity is replaced by the weaker condition (3). 

There is one class of problems to which it may be desirable to apply the method 
of the present paper regardless of sample size; namely, if no similar region exists. 
Suppose, for instance, that Ai, • • • , Zn are known to be normally and inde¬ 
pendently distributed, Z, having unloiown mean and variance and <s\ for i = 
1, ■ ■ , n. For testing the hypothesis 

Ho: 0-, = 1, (z = 1, ■ • • , 7i) 

no similar region exists, while it is easy to see that against any simple alternative 
Hi <r, = v.'i < 1, = f,i , 

there exists a test which satisfies condition (3) and which has good power against 
Hi provided the o-.i are sufficiently small. 

The present first part of this paper is restricted to h 5 q) 0 theses concerning 
normal distributions. It is intended to extend the considerations to exponential 

' In an unpublished paper, it is shown by G. Hunt and C. Stem that the traditional test 
is most stringent in several cases, including the (univariate) linear hypothesis and the 
hypothesis specifying the ratio of the variances of two normal distributions These results 
can be extended to analogous problems for distributions other than the normal, and similar 
results can be proved regarding minimization of the maximum risk if the weight function 
has a certain type of symmetry 



COMPOSITE HYPOTHESES 


497 


and rectangular distributions, to consider non-parametric problems and pos¬ 
sibly also more complicated problems connected with normal distributions, in 
later parts of the paper. 


2. Sufficient conditions for a most powerful test. The method which will be 
used in this paper to obtain most powerful tests is an adaptation of the funda¬ 
mental lemma of Neyman and Pearson [1]. At the same time it is essentially 
a special case of much more general results of Wald [3, 4], although the exact 
conditions of Wald’s investigation are not satisfied in most of our problems. 

Let h and g be two functions defined over Rn , let L be a constant and let w 
be a region in Rn such that 


g{x) > k h(x) in w; 


(4) 


g(x) < k h(x) mRn ~ w. 
Then if w' is such that 


(5) 


/ hix) dx < / h{x) dx, 

J XU* JxQ 


it follows as in the fundamental lemma where in (5) equality is assumed instead 
of inequality, that 


( 0 ) 




Throughout the present paper we shall be concerned with the special case in 
which U is an s-paramcter fanuly. We may denote the membeis of Tby/j and 
we shall obtain all membeis of 3" as 6 ranges over a set u m an s-dimensional Eu¬ 
clidean space. In the theorem which we shall now state, we shall be concerned 
with point functions X defined over w. We shall assume that X = c/i where c 
IS a positive constant and /x a cumulative distribution function.^ Also we sup¬ 
pose that fe{^) is a measurable function of x and d jointly. However, the theo¬ 
rem is also valid if u is an abstract space and X a (finite) non-negativc additive 
set function (measure) over w. Such more general interpretation may be re¬ 
quired when applying the theory to non-parametric problems. 

Theorem 1. Let Ho be the hypothesis that the random variable X is distributed 
according to a density Junction fo vnth 9 in w, and let Hi denote the alternative that X 
is distributed according to a density g. Let h he a Junction dejined over a> and such 
that 


(7) 


X = Cfi, 


^ The introduction of the distribution m is simply a mathematical device and does not 
imply that 9 is a random vaiiable (see Wald [16] p 282) 



498 


E. L, LBiraiANN AND C. STEIN 


where c is a positive constant and p a cumulative distribution function. Let k be a 
constant and let w he a region in lin such that 


(8) 


g{x) > k / fii{x) dh{d) in w, 

w 11} 

g{x) < k f Mx) dX(9) in i2„ 


iv. 


Suppose that w is of level of significance e for testing Ho against Hi , that is that 

(9) [ fo(x) dx < 6 for all 6 in w, 
and suppose that the subset of u for which 

(10) j fiix) dx < t 

has \-measure zero. Then w is most powerful for testing Ho against Hi at level of 
significance e. 

Proof, Without loss of generality we shall assume c = 1. Let w' be any 
test of level of significance t Then 


(11) 1 fe(x) dx < e for all 0 in w, 

»' U’' , 

and because of (7) 

(12) [If fe(x) dx]dX(d) < e f dX(d) = 6. 

Since X is of bounded variation we may interchange the order of integration in 

(12) and obtain 

(13) f h(x) dx < e, 

J \jjf 

where 

(14) h{x) = [ f)(x) dX(e). 

From (9) and the condition surrounding (10) it follows that 

(15) j l^j foix) d:^dX(e) = t, 
and therefore that 


(16) f h{x) dx ~ e. 

*1 ti} 

Thus w and w' satisfy conditions (4) and (6), and hence also (6) which completes 
the proof. 



COMPOSITE HYPOTHESES 


499 


It IS useiul to notice that, the assumptions of theorem ] will be satisfied pro¬ 
vided 

[ ft{x) (H 
*fw 

attains its maximum e at all points of increase of X, and therefore in particular 
whenever w is a similar region of size e. 

We shall in many problems exhibit a function X which satisfies the conditions 
of theorem 1 without giving the reasons which led us to this function However 
the followmg comments concernmg the tentative process that we used, may be 
helpful. One may first examine the known most powerful similar region. If 
there exists a cumulative distribution function X such that (8) is the most power¬ 
ful similar region, the problem is solved. If the moat powerful similar region 
cannot even be approximated by (8) with a sequence of X’s, it is reasonable to 
conclude that the most powerful test is not smiilar Because the probability 
(under the null hypothesis) of any test is in all the problems considered here an 
analytic function of the parameter, this implies that the probability (under the 
null hypothesis) of the most powerful test attains its maximum at an at most 
denumerable (in some cases finite) set of points In all the cases of this kmd 
which we considered m the present pait I, it was then possible to prove the 
existence of a function X with a single point of increase, which satisfied the condi¬ 
tions of theorem 1 

A theorem analogous to theorem 1 holds for most powerful similar regions. 
Let Eo and Ei be as before and let X be a function of bounded variation not 
necessarily non-decreasing. Let w be a region in R„ such that 


g(x) > fc / fe(x) d\(0) in w; 

J u 

g(x) < k f fiix) d\{6} in Rn - 
J (j 


Let u) be a similar region of level of significance e for testing Ha against Hi , that 
is, let 

(18) L feix) dx = € for all 0 in u, 

then to IS a most powerful similar region for testing Ha against Hi 

For all the problems considered m this paper we shall prove the existence of 
functions X satisfymg the conditions of theorem 1, but we have not investigated 
the corresponding existence problem in general. On the other hand one verifies 
easily that for many of the cases treated here in which the most powerful test is 
not similar, the method for obtammg most powerful similar regions does not 
apply. However, for all the problems considered m the present paper the most 
powerful similar tests can be obtained easily by other methods [1, 5, 6, 7, 8]. 
For most of the problems the corresponding derivations have been carried out 
in the literature. 



600 


E. L. LEHMANN AND C. STEIN 


Although we restrict ourselves in the present paper to the problem of maximiz¬ 
ing the power at a single alternative, theorem 1 clearly also applies to the more 
general problem of maximizing the average power over surfaces in a space of 
alternatives, Such problems have been considered from the point of view of 
similar legions by Wald, Hsu and others [9,10,11] 


3. Testing the values of one or several variances. Let Xx, ■■ ■ , Xn be a sample 
•from a normal population with mean £ and variance a, both unknown. We 
want to test the hypothesis Ho that v = vo against the simple alternative that 

n = (Ti) f ■ We shall show that the most powerful test for Ho against Hi 

is 

(19) -(a;, - < k when ixx < o-q , 

(20) 2(x, — xf > c when o-i > tro. 


where k and c arc determined by the level of significance. Thus the best similar 
region is most powerful if the variance under the alternative is greater than that 
under the null hypothesis, while the most powerful tests against the other alter¬ 
natives are not similar. That the region — a)'^ > c (< c') is most powerful 
of all similar regions against ax > iro (<ri < vo) was shown by Neyman and Pear¬ 
son [1] 

We consider first the ease o-i < vu, and apply theorem 1 with X a stepfunction 
having a single jump at , that is, 


( 21 ) 


X(f) = 


0 if 
1 if 


The region w given by (8) thus becomes 


( 22 ) 

which is equivalent to 


exp 

exp 


-^zS(a:,-fi)' 

-27: 


> k\ 


(23) 


2(a:,' - fi)’' < I, 


since (Ti < (To. The size of the region (23), that is, its probability under the null 
hypothesis is a function of f and clearly attains its maximum when ? = fi. Thus 
all conditions of theorem 1 are satisfied provided wo choose k so that the maxi¬ 
mum Size of (23) equals e. 

Pefoi'c considering the case ax > o-q wc state for later reference the following: 

Lemma 1. Jf ax > ao there exists an absolutely continuous non-decreasing func¬ 
tion X of hounded variation such that 


1 


1 



lyw J. 


This follows immediately fiom the well Icno^m representation of exp ^ ^2 ^ 

as a Laplace transform by applying a tianskfcion, and is easily verified directly 
by substituting 


(25) 


X'(0 = exp 


1_ 2((Ti — (To) 




Now let (Ti > (To and n > 1. The region w given by (8) can be expressed in the 
form 


exp 

-A S(a;, - xf 

Zoi 

exp 

1 



1 

exp 

2 (a:. - xf 

Z(Tq ^ 

j 

!■ ~1 

-~i ix - 

dX(0 


By lemma 1 there exists an absolutely continuous function X for which the second 
factor IS constant. For this X (26) is equivalent to 

(27) 2(», - xY > c, 

and since this is a similar region, the conditions of theorem 1 are satisfied pro¬ 
vided c is chosen so as to give the correct level of significance. 

We next consider the problem m Avhich the random variables X, (^ = 1, • ■ ,n) 
are independently normally distributed with unlmown means and unknown 
variances <r?. We ivish to test the hypothesis ifo : = v,ci for t = 1, • • • , n 

against the alternative Hi. v, = v,i, = f.i. Feller U2] showed that there 

exist no similar regions for this problem, However, as we shall show now, when 
the critical regions are not required to be similar, non-trivial tests against Hi 
do exist provided ca < v,o for at least one value of z. 

Let us assume without loss of geneiality that cr,i < v,o for z = 1, ■ • j zzi, 
(T,i > ff.o for z = m + 1, • , zz where n - m may be zero^but where for the 

moment we shall assume m > 0 With X(Ji, • , U) = H ffi® region 


(8) becomes 

m 

n 

•-1 

(28) 


exp 

{x, - Uf 

^(Til J 


/ 


d\^{Q 


• n 

7=^+1 


exp 


{x, - 5,l)® 


f" 1 

r 1 , 

exp 1 

*f—QO 

-^{x, - Q 


- >/c. 


d\M,) 


For X (z = 1 ■ m) we take step functions Avith a single jump at ?.i, while 

for the remaining X's we choose the absolutely continuous functions which make 




502 


E. L. LEHMANN AND G. STEIN 


the second factor constant and whose existence is guaranteed by lemma 1, The 
region (28) thus reduces to 

(29) £ ( '2 2 ) (a;. ~ fii)'* < c. 

t-i \£r,i o-.o/ 

Since the probability of the region (29) is independent of fm+i, • • ■ , and with 
varying , ■ • • , takes on its maximum when = (,i it follows from theorem 
1 that this region is most powerful for testing Ho against Hi. 

We still have to consider the case m = 0, that is, the ease in which o-ii > o-,, 
for all i. To treat this problem wo adjoin to the variables Xi, ■ • • , Z„ a random 
variable F rmiforinly distributed between 0 and 1, that is, essentially a table of 
random numbers. In the space of n + 1 random variables we determine a region 

tl 

w according to (8), letting X(fi, • • , in) = n X.(fi) and choosing the X’s so 

as to make the left hand side of (8) equal to the right hand side. This is possible 
by lemma 1 and with this choice of the X’s the inequalities (9) become 


(30) 


k > k in w; 


k < kin Rn+i — w, 


and hence they impose no restrictions on w. Thus any similar region of the cor- 
lect size will satisfy the conditions of theorem 1, It follows that the region 

(31) w. 0 < y < t, 


being a similar region of size «, is most powerful. This result means that we do 
not use the obseiwations si, • ■ ■ , at all but consult a table of random num¬ 
bers. 

The situation just described occurs in other problems to which the same 
method of proof can be applied. It is therefore convenient for later reference to 
formulate the following 

Theorem 2. Let Ho he the hypothesis that the random variable X is distributed 
according to a probability density function ft with 6 in w, and let Hi denote the alter¬ 
native that X IS distributed according to the density function g. Let Y be a random 
variable known to he unifoimly distributed over the interval [0, 1]. If there exists a 
real valued function X satisfying (7) for which 

(32) ( 7 (.t) ^ k f fe(x) d\(9), 

J(a 

then the critical region Q < y < e is most powerful for testing Ho against Hi at level 
of significance «. 


4. Testing equality of variances and the value of the circular serial correlation 
coefficient. For each i = 1, ■ • • , m let Xij{j = 1, • • ■ , n.) be a sample from a 
normal distribution with E{X^f) = and E{X^, - = c!. We are con- 



COMPOSITE HYPOTHESES 


503 


cerned Aiith the hypothesis Ha that (Ti= = • = cr„^ ^ where first we shall 

assume the | s to be known, so that without loss of generality we may assume 
them equal to 0. The alternative hypothesis specifies v. = o-,i, f = 1 ■ ■ m. 
Let cr^ denote the unknown common variance under H^ and let \{a) be a step 
function with a single jump at a point vo to be detcmined later With 

fc = n (~ ) ) the test (8) takes on the form 

XO"!!/ 


(33) 

or equivalently 

(34) 



Since the function on the left hand side is homogeneous of degree 0 in the a:’s, 
this is a similar region and the conditions of theorem 1 are therefore satisfied 
provided the region has the correct size. This can be achieved for any level 
of significance e by proper choice of d. 

As stated earlier, the conditions of theorem 1 imply that the size of the critical 
region is equal to e at all points of increase of X. As a consequence, if the size 
equals e at only a finite number of points of w, X must be a step function. Also 
if each point of a certain interval is a point of increase of X, the critical region 
must be similar over that interval (and, if the functions mvolved are analytic, 
the region must be similar over u). However, the last problem shows that the 
converse of neither of these two statements is correct. For the region (34) 
is a similar region although the corresponding X has only a single point of increase. 

Next we consider the hypothesis of equality of variances without assummg the 
means to be known For the case m = 2 the most powerful similar region was 
obtained by Neyman and Pearson [1] We assume first that n. > 1 for all i, 

m 

and we take X(£r, , • • ■ , ?m) = Xo(o-)II^»(I*)> '^^ith Xo(o-) as before a step func- 

tion with a single jump at a point vo to be determined later Suppose now that 
Vo > v,i for f = 1, • • , s; o-o < <r,i for t = s + 1, ■ ■ • , m, tm < vji < • • 

where 0 < s < m and s depends on o-o. Then define 


(35) 


jo if 

1 1 if > 1.1 


for i = 1, • , s and use lemma 1 for t = s + 1, ■ • • , m 

For proper choice of k the critical region will then be determined by the in¬ 
equality 



504 


K. L, UJIIMVNN AND C. flTKIN 


(30) 




f.r.j - /,)“ 




- Uf > 0 


4'he probability of tiiis region computed undei !!„, i.s independent of , • ■ • , 
and for any <t attains its maximum when f, = ?,i (? = 1, ■ ■ ■ , «). Since the 
probability of the region is independent of a when = ^,i for i = 1, • , s, 
the oonditions of tlieorem 1 are again established That for i, = |a the size of 
(36) goes continuously from 0 to 1 with decreasing o-n is easily checked since at 
the only doubtful points vo = <r,i (where the value of s changes), the correspond¬ 
ing coefficient “a — 2 passes thiough 0. 

tro <rii 

We still have to consider the case that some of the n, are equal to 1 If m, = 1 
for some i < s there is no change uhatever, while if n, = 1 for some i > s, 
the corresponding term in (36) vanishes It follows easily that if ru > 1 for at 
least one value of i > f the solution (3C) is valid. On the other hand, if rq = 1 
for all i > 1, we can apply theorem 2 by taking ao = <r,i, Xi(fi) as a step function 
with a single jump at {u and the remaining X,(f() according to lemma 1. It thus 
follows that for this prolilein no non-trivial test exists. 

The following problem can be reduced to the hypothesis of equality of vari¬ 
ances with means assumed Icnown: Under the null hypothesis Xi, • • • , X„ have 


a joint multivariate normal distribution with density C exp 







where the a's are Icnown and where a- is an unknown scale factor. Under Hi 
the X’s have a joint multivariate normal distribution with density C exp 



'XhijXyX, 


A number of hypotheses specifying the value of one or several 


correlation coefficients liave this form. The most powerful test of Ih against 
Hi IS given by 


as IS easily shown by applying a non-singular linear transformation which re¬ 
duces SbijXiXj to diagonal form and ^aijX,Xj to a sum of squares, or by applying 
directly the method of proof of the earlier problem. 

A corresponding reduction when the A’'’h hivve a common but unknown mean is 
usually impossible. One problem of this kind for which the solution is simple is 
the hypothesis specifying the value of a serial correlation coefficient in a circular 
population. The most powerful similar region for testing this hypothesis was 
obtained in [7] Consider the probability density function 


C exp 


{Xy - f) - 5(a:,+i - ^)j , 


(.a:„+i = Xi), I 5 I < 1, 


(37) 



COMPOSITE HYPOTHESES 


505 


and let Ho specity 5 — 5o Avhile E\ assigns to the parameters the values ai, , 5i 

Then the most powerful test of against is 


(38) 


S(.E, - .t)(Ti+1 ~x)^ 
■" 'fix, - xY 


if 5i > , 


— Jl)(.Tj+l — ?l) ^ ,, , 


Si < So 


We shall omit the proof of this result, since the method is the same as in the other 
problems considered in this section. 


6. Student’s hypothesis and some generalizations. As the principal result of 
the present section we shall prove that for testing Student’s hypothesis against a 
simple alternative the most powerful test is a non-similar region of the form 

(39) S(X, - nf < h, 

if the level of significance e is less than or equal to f Here n t'^nd k depend on 
€ and on the alternative, and they will not be determined explicitly. It will be 
shown also that if e is greater than or equal to Student’s test is most powerful. 
Those results will be extended rather easily to the general univariate lineal 
hypothesis The corresponding investigation for similar regions was earned 
through for Student’s hypothesis by Neyman and Pearson [1] while the extension 
to a general linear hypothesis is contamed in a paper by Hsu [13]. 

The proof of the main result mentioned above is rather lengthy We shall 
begin by proving two lemmas. 

Lemma 2 Let Fi , • , Fn n xnde'pmdent random variables, normally dts- 

inhuted vnth 0 vican and unit variance, and let 

P{a, fc) = P jZ (F. - af < in- k)ar ]; 

(40) ^ 
ifik) = sup P(a, k) for 0 < k < n, 0 < a 

a 

Then for each k there exists a{Io) such that 

(41) P(a(fc)> fc) = 'p(fc)- 

Proof. If Z, = Y,/a, {i = 1, • ■ • , n) the Z’s are independently normally 
distributed wnth zero mean and variance 1/a and (40) may be written as 

(42) P(a,fc) =P{2(Z. - l)"<n-fcl 

Hence it is seen that for any k, P{a, k) tends to zero as a tends to either zero or 
infinity. This proves the lemma since for any k, P [a, k) is a continuous function 
of a. 

T.fmma 3. Given any e, 0 < e < ^ there exists h{e) between zero and n such that 
<p(fc(e)) = e. 



50G 


U. Jj, LICHMANJC AND C. STEIN 


Proof. Tlie proof will be given in n number of steps 

(i) <p(k) —> I as k —» 0. 

Clearly P(a, k) never exceeds . The result will therefore follow if we exhihii 
a .sequence m such that P(a,, k) ^iask-^0. Let a, = 1/^k Then 

(4'^) , k) = P{V/c - 2SF. + VT < 0). 

The right hand side is a continuous function of k and therefore tends to 
(44) P|SF, >0}=i, 

as k tend.s to zero. 

(ii) ipik) —1 0 as k —> n. 

f integral of the probability density 

of the J?s the region of integration is independent of a and its volume tends to 
0 k tends to n. On the other hand the probability density depends on a 

result fonm™ ^ integration if k > 0. and hence the 

(111) If 0 < ico, P(a, k) tends to zero uniformly for k in the interval ko < k <v 
aa a tends to zero or infinity. 

This follows from the fact that 0 < P(a, k) < P(a, ko) since P{a, ko) tends to 
0 as a tends to zero or infinity. 

(iv) Given ka and h there exist numbers a, and ai with 0 < oo < n, < «. 
such that Q < ko < k < la < n implies Oo < a(k) < oi, 

If this were not true there would exist a sequence fc'*' with ko < k^‘^ < h and 
n'J fv'"! to infinity or zero. Then <p(a{k^'>)) would tend to zero by (hi) 
On the other hand consider P(l, k) for h<k<h. This is a continuous non- 
vanishmg ^notion of k and hence attains its lower bound m for some k in h < 
/>. S /Cl. Therefore m is positive and we have a contradiction 

val ^ '‘1 < ’h is continuous on the mter- 

To see this, select Oo and m in accordance with (iv). Then P(a, k) is uniformly 
caatinuous in the lectangle a, < a < ay, ko < k < Ja . Given „> 0 let 5 be 
such that I fc - k < S implies | P(a, k'} - P(a k") I < v Then > 

Pm, /O > P(a(n .'0 - J^) _ ;ndVyUme;r;(rj > 

) Vt 'vvhich establishes the continuity of <p. 

The proof of the lemma is now immediate. For let 0 < e < -J. It follows 
from (i) and (ii) that there exist ko and la such that 

vih) < «/2, ^(fci) > « + 
and hence by (v) there exists /c(e) for which <p(kit)) = e. 

Let us now consider Student’s hypothesis. The random variables Xy, - ,X„ 
are a sample from a normal distribution which under Ho has mean 0 and im- 
known variance v , while under Ily the mean is and the variance d Without 

w? > 0. Applying theorem 1 with X a step- 

. . ^ single jump at a point co > ci to be determined later, we ob¬ 

tain the critical region m the form 



COMPOSITE HYPOTHESES 


507 


(45) (A - 4) 2X; - 2 ^2Z. < c 

VO"! O'o/ ffi 

Let y, = X,/a so that under the F’s are distributed with zero mean and unit 
variance. Then (45) becomes 


(46) 


2Yl - 2 — 

<r(l 



sy. 


“ 2 * 


which may be ■written as 


(47) 2(y. - af <{n- k)a\ 

where 



As c varies from 0 to m , a goes from <» to 0. Let P(a, h), <pih) and a{l) be 
defined as in lemma 2. Given the level of significance e (0 < e < ^), let k* 
and a* be determined according to lemma 2 and 3 so that 

(49) <fiik*) = £ and P(o* k*) = <p{k*). 

We now select o-o > ffi and c so that 


(50) 


a* = 


(1 — (ri/o'o)(ro 


and k* 




We have to show that for this choice of vo and c the size of the critical region at¬ 
tains its maximum when or = co and that this maximum size is e. Substituting 
from (50) we express the region (47) in the form 

(51) • S ^y. - ^ a*J < (n - k*) a*\ 


Thus the probability of the region is 



As ff varies, (52) attains its maximum when —a* = a(/c*) = a*, that is, when 

(T 

cr = <jo and the maximum value of (52) is (p{k*) = e. 

This derivation is valid even when n = 1, i.e., when the hypothesis g = 0 is 
to be tested by observing only a smgle random variable X, loiown to be nor¬ 
mally distributed but whose mean ^ and variance are unloio-wn. For this prob¬ 
lem no similar region exists. However, critical regions of the form 0 < ^i — a < 
X < + h will give any level of sigmficance < y for proper choice of a and h, 

while the power of such regions 'will tend to 1 as ffi tends to 0 Therefore, the 
power of the most powerful test will be close to 1 if vi is sufficiently small, 



508 


K. L. IJOIIM-VNN AND C STEIN 


Having completed the discus,sion of the case e < ^ let us next suppose that 
« > 5 . We shall need the following 

Lemma 4. Let c and ai he jwsittve constants Then there exists a function / 
such that f (a) = 0 when a < ai and such that for all w > 0 

(53) fcrrfMda = 

^0 

Thi.s follows from the well Imown representation of as a Laplace trans¬ 
form by applying a translation. (53) can be checked directly by substituting 

(54) /(«) = ^ “1- 


Applying theorem 1 to Student’s hypothesis, where again we shall assume 
to be positive, for proper choice of k we obtain from (9) 


(55) 


exp 


— 2 2X1 -h 2 

jiitTl 


Vl 


r r 1 ^ 

/ exp ~ „ 2 X,' 

■to ba,. L 20- J tr” 


> 1 . 


dX(o-) 


It follows from lemma 4 that for any positive c there exists a non-dccreasmg 
function X of bounded variation with X((r) constant for a > vi, such that 


(56) 


r 

r 1 

/ exp 
Jo 

_ V 

L 2<r^ 


i d\{(r) == exp - 2^^2 2X! - c VSx* j . 


For this choice of X, (65) reduces to 

> exp [-0 

and hence to 


(57) 


exp 


k y.. 

2 ^tl{ 
<^1 


(58) 


2a:, 


V^xl - 


This is a similar region and therefore most powerful for testing Student’s hy¬ 
pothesis against Hi. By adjusting c, the size of the 1 egion can be made equal 
to any e > ^. 

The argument for o > ^ must bo modified slightly in the case n = 1, that is, 
when we want to test Student’s hypothesis on the basis of a single observation. 
Let us adjoin to the variable X a random variable Y known to be uniformly 
distributed over the interval [0, 1 ], Using the same X and k as before, (58) 
becomes 


(59) 


X 

X 





COMPOSITE HYPOTHESES 


509 


For c' = -1 the critical region includes all points (x, y) for ivhich t is positive 
while (59) places no restriction on which of the remaining points to include in 
the critical region The similar region 

(60) a: > 0, x < 0, 0 < y < 2{e - 


therefore satisfies all conditions of theorem 1 and hence is most powerful 

In extending these results to the general linear hypothesis, we shall assume the 
hypothesis leduced to canonical form [14, 15], We shall therefore assume that 
Xi, , Xn are normally distributed with common vaiiance which is unknown 

under Ho and has the value ci under Hi. Furtheimore, unclei Ho , EiX,) = 0 
for i = 1, ■ , s, s + 1, ■ , m, E{X,) unlcnown foi f = m + 1, • ■ , n while 

under Hi S(X,) = Oforf = s + 1, ,m,E{Xt) = for the remaining values 
of i. 

For € < i we shall consider critical regions of the form 


(61) 


exp j 

f 1 

L 2<r! 

Z (x, — f,i)* + Z + Z (r, — ?a)^11 

exp 

1 

2(1? 

^ m n “ 1 ' 

Z a:? + Z a:? + Z (a:, — fu)^ 

i=»+l t’-m+l _ 

} 




which are obtained from (8) by substituting for X a step-function with a single 
jump at the parameter point (<r,, U+i, i, • , i). Making an orthonormal 


transformation from x ,, 


, to yi ,■■■, yi such that yi = .-i and 

Vmx 

letting 7/, = X, for f = s -f 1, ,m,y, = x, - for z = m + 1, ■ ■ , n, 

(61) reduces to 


exp<| 

r 1 
r 2v? 

[z 2 /? - 2yi /|/z ?u 1 

exp<j 

r 1 n ) 

^ V' 2 

- 

LiJo J 



(62) 


For tro > <ri we^can rewrite (62) as 
(03) ^ ^ - 1 


> c. 


Z) < 


2 2 
Cl ~ Co 


. J ^“1 


and we see that under Ho for any v the size of this region considered as a function 
of the unlcnown means of F^+i, • - , F„ takes on its maximum when these 
means are zero, i e. when ^ for z = m + 1, • , n. For these maximizing 
values of the means the existence of a suitable vo and c follows from the corre¬ 
sponding result in connection with Student’s hypothesis 

Thus the most powerful test for testing Ho against Hi at level of significance 
e = I has the form 




+ Z + Z (Xi — fil)^ < c. 

*»=s+l 1=771-1-1 


(64) 



610 


]0. L. LKIIMA.NN AND C STEIN 


It is interesting that the variables Xi{i = «i + 1, • • , n) which may be dis¬ 
carded when considerations are restricted to similar regions [18], do contribute 
to the power when similarity is not required. The same phenomenon also 
occurs in certain problems considered earlier in this paper. 

For the case e > ^, let us take 

» 

(65) A(<r. ?,„+i ,-■•,{„)= X(<r) II I (r). 


We shall select X(<r) such that X(cr) is constant when o- > ci. Hence it is enough 
to define X,(f, | a] for tr < vi. For any o- < vi there exists by lemma 1 a func¬ 
tion X,(^, 1 a) such that 


( 66 ) 


r°° r 1 1 f 1 

j exp - (x, - dx,({, I tr) = k exp j (.t, - f ,i)' 


For tliis choice of the X,, (9) becomes 
(67) 


exp N - h 2 (^. - f.i)* + S -r! 

_^__t«-s+i J ^ jj 

Next we chose X(ff) according to lemma 4 such that 

r [- h §*■•] h [- i - Vs *■], 


(08) 


thus, by proper choice of k', reducing (67) to 


(69) 


Xi 


vtS' 


> — c. 


The piobability of this region under Ho is independent of £m+i, • • • , and u, 
and hence (69) is most poweiful for testing Ho against Hi 

Let us return once more to the problem of testing Student’s hypothesis against 
a simple alternative f = fi, o- = 1 and let us assume as known that «■ < 1. No 
use can be made of this knowledge if consideration is restricted to similar regions. 
For the probability of first kind error is an analytic function of v, and conse¬ 
quently, if a test is similar with respect to all values of a- which are < 1, it is simi¬ 
lar with respect to all values of cr. Let us now consider this problem without the 
restriction of similarity If « > i, the Imowledge concerning a does not enable 
us to find a test which is more powerful than that given by (58), since the func¬ 
tion X(dr) on which (58) was based had all its points of increase for o- < 1. 

On the other hand we may expect improvement for e < i since the most 
powerful test in this case was based on a function X with a single point of increase 




COMPOSITE HYPOTHESES 


511 


o-Q > 1 which IS no longer admitted as a possible value of a. If, instead, we take 
for X the step function with a single jump at cr = 1 we obtain the critical region 


(70) 


exp [ - ^ S (a:. - ^i)^] 

exp [ - ^ 12 Xi] ’ 


Avhich is equivalent to 


(71) 


X > c. 


Here c > 0 since e < §, and therefore, when ^ = 0 the probability of (71) is an 
increasing function of a and hence takes on its maximum at o- = 1 It follows 
from theorem 1 that (71) is most powerful under the conditions stated. 

In the opposite problem in which it is known that d > 1, the situation is 
reversed. For e < | no improvement over (45) is possible while for e > ^ wo 
can use for X the step function ivith a single step at ir = 1 thus obtaining the 
critical region (70) but this time with c < 0. When ^ = 0 the probability of 
this region is a decreasing function of «r and it follows that (70) is most powerful 
m this case. 

Similar remarlcs apply to other problems We mention as one further ex¬ 
ample a modification of the Behrens-Fishen problem Let Xi, ••• , X„ and 
Fi, ■ • , Fm be mdependently normally distributed, the X’s with mean J and 
variance the F’s with mean i\ and variance t“, all four parameters being un- 
knovm. We wish to test, at level of significance e < the hypothesis f = tj 
against the simple alternative ^ = = = = where fi 4= »7i and 

we assume it knoivn that o- < 1, r < 1. Basing the test on a step function X 

<n t. 777,771 

with a single jump at c = 1, r = 1, f ;—- ive obtain for w the region 

“ m + n 


exp [ — i 12 — € 1 )“ — 7 12(y. — ^n)^] 


(73) 


exp 


1 t+ mrnY 1 V 

n + m ) n + m ) 


> K 


which IS equivalent to 

(74) y — x> c (c > 0), 

if we assume, as we may without loss of generahty, that iji > . When ri = 

^ 2 ,7 - X is normally distributed with zero mean and variance - + - ■ There- 

2 2 

fore the probability of (74) is an mcreasing function of ^ ^ and hence attains 

its maximum when o' = t = 1. It follows from theorem 1 that the region 
(74) is most powerful for the problem under consideration. 


6. Admissibility. The general problem to be considered in this paper has 
been formulated m section 1: To obtam a region w 



512 K L. LEHMANN AND C STEIN 

(75) maximizing 

subject to the restrlctiou 
(7(i) [ feU) dx < £ 

Jw 

Since for any particular such problem there may exist several essentially different 
regions satisfying these conditions, it may happen that there exists a region w' 


such that 


(77) 

/ g{x) dx = / g{x) dx, 

*1 to* Jw 

and 


(78) 

f ft(x) dx < [ fe(x) dx for dl 6 e oi, 

J to ^ J w 


with inequality holding for some 0. Clearly w' is preferable to w. In this case, 
following the definition of Wald [4], we say that w is not admissible. We shall 
rule out this possibility for a large" class of problems by proving 
Theorem 3. If w satisfies the conditions of theorem 1, and if the set of points 
X for which equality holds in (8) has measure zero, then any region satisfying (75) 
and (76) differs from w only on a set of measure zero. 

Proof. Without loss of generality we shall assume X of theorem 1 to be a 
distribution function, Then 

hix) = f Mx) dm 

is a completely specified piobability density function, and iv is the unique’— 
up to a set of measui e zero—most powerful test for testing the simple hypothesis 
H'o'.h against the simple alternative Ih'.g. Suppose now that w' satisfies (75) 
and (76). Then 

(79) [ h(v) dx < f, 

•Ik' 

and w' is most powerful for testing Ho against Hi. It follows that w' differs 
from w at most by a null set. 

Earlier we enlarged the problem of testing by adjoining to the original random 
variable X a random variable with a known distribution. This is equivalent 
to the following modification of the original problem Instead of defining a test 
to be a critical legion (of rejection) in the space of x, we define it to be a critical 

® One sees this easily from Neyman and Pearson’s proof of the fundamental lemma [1], 
by using the assumption that the sot of points for which equality holds in (8), has 
measure zero. 


/ g{x) dx 

Jut 


for all 6 e tx>. 



COMPOSITE HYPOTHESES 


513 


function ip (0 < <p(a;) < 1) which with every point .r associates a probability of 
rejection ip{x). If a; is observed, the hypothesis is rejected with probability ^(x) 
accoiding to a table of landom numbers In the case wheie random numbers 
arc not employed, p merely becomes the characteristic function of the set w 

We shall nov' state a theorem which will prove admissibility for all but one of 
those problems treated in sections two to five, to which theorem 3 does not apply. 

Theorem 4. Supposed) = \d} is a subset of an s-dnnensional Euclidean space, 
and that fov any ineasuTahle function <p and foi any set S which has positive measure 
and is contained in w 


(80) 

implies 

(81) 


J <p(x)fii(x) dv = 


for 6 e S 


j>piX')fe{x) dx = c for 8 


{Here and in all that follows whenever a region of integration is not indicated, the 
integral extends over the whole x space). Suppose further that f is a critical function 
satisfying the conditions of theorem 1 and that the set So of points of increase of \ 
has positive measure Then (p is admissible. 

Proof. If (p were not admissible there would exist ipi with 


(82) 

(83) 

(84) 


J<Pi(x) g(x) dx = j<p(x)g(x) dx, 
j<piix)fo{x) dx < Jvixffeix) dx for all 8 e w; 
J vi(x)fi{x) dx < J <p(x)fti(x) dx for some 8 t co 


The set J' of points 8 for which (84) holds, differs from oi at most by a null set. 
For 


(85) 


J [(pi(a;) - p(.'i;)]//a:) dx = 0 for 6 tu - T, 


and if CO — T had positive measure, (85) would hold for all 8 t co. 

Let h and Ha be defined as in the proof of theorem 3. Since S has positive 
measure, it follows that 

( 86 ) e = j ip{x)h{x) dx > J <pi(x)h(x) dv = rj, say. 

Let tp2{x) = min |^1 , ip\{x) + e — rjJ. Then 

(87) 


J(p 2 (x)h(x) dx < 



514 


10. L. LEHM.4.NN AND C. STKIN 


and 

( 88 ) J<Pi{x)g(x) (lx > j (piixMx) dx. 

But is most powerful for testing Ho against IIi and ^ve have a contradiction. 

By applying theorems 3 and 4 one can easily show for all but one of the prob¬ 
lems treated m sections three to five that the tests olrtained there are admissible. 
The one exception occurs when testing equality of variances. Simplifying the 
notation, since we arc now concerned with a special ca.se, wc shall assume that 
Xi(i = 1, • • • , n), 7i, ■ ■ ■ , Yr are independently and normally distributed, 
the X’s with mean fo and variance vo) T, with mean £, and variance cr“, all para¬ 
meters being unknown. We wish to test the hypothesis of equality of variances 
against the simple alternative 

Hi = f,i, ffi = o-.i (i = 0, • • • , r), 

with 


<roi < (Til < • • • < (Trt . 

We shall first consider the case n = 1, and prove admissibility of the critical 
function 


(89) 


Ax, Vi,-- - ,yr) = e 


by using a different distribution function for the parameters from the one used 
earlier, With some specialization of the distribution function, (8) becomes 
for our problem 


(90) 


exp I ~ 


n Z)" (y. - £n)“) 


■ JjJ exp dXi'’(f.)j d AA 


> k 


For any v < voi we select the Xi’’(?,•) according to lemma 1. If we then take fo^ 
/i the uniform distribution over (troi — 1, aoi) the left hand side of (90) will reduce 
to fc. Admissibility of the critical function (89) then follows from theorem 4. 

That a constant critical function is not admissible in the case « > 1 is easily 
seen if one compares it for instance with the critical region 


(91) 


X — foi 
Vs (an — x)® 


We shall not obtain a complete family of admissible tests (cf. [4]) for the case 
n > 1 but we shall show that this problem is equivalent to the following one: To 
find a complete class of unbiased admissible tests for the hypothesis specifying 



COMPOSITE hypotheses 


515 


the mean and variance of a normal distribution on the basis of a sample from 
this distribution, the class of alternatives being the totality of univariate normal 
distributions. 

Let n > 1 and kt <e be any moat powerful critical function for testmg the 
hypothesis of equality of variances against . If ^ corresponds to the level of 
significance e and if denotes the power of ip, we have 

(92) O') ■ • ) O', fo) , • • , fr) < e 

for all admissible values of the arguments It also follows from section 4 that 

(93) , O'!! • , cTfi , foi , 111, • • , |,i) = e, 

Consider for a moment the hypothesis = (roi(i = 0, • - ■ , r), |o = loi I. 
unspecified for i = 1, ■ ■ ■ , r. It is easily seen that the maximum power for test¬ 
ing Hq against Hi is c. Therefore any most powerful test for testing He against 
III IS also most powerful for testing He against Hi, and in particular this holds 
for (p. Furthermore, it follows easily from theorem 4 that for any moat powerful 
test of Ha against Hi the probability of an error of the first land must be iden¬ 
tically equal to e Therefore 

(94) 1 ' ‘ ‘ I <roi, |oi) li) *' ■ ) Ir) “ s for all |i, ■ ■ • , Ir 

But (94) is equivalent to the condition that p is similar with respect to |i, • , 
Ir, and it follows [12] that is a function of xi, • ■ , only. The problem is 
therefore reduced to that of finding all admissible critical functions <p{xi, • • • , x„) 
satisfying 

(95) ^<p(croi , loi) = e, , lo) < e for all (Tq , |o . 

That this problem in turn is equivalent to the one stated above is immediate when 
one considers the complementary critical functions I — <p 

REFERENCES 

[1] J Neyman and E. S. Peabson, “On the problem of the most efficient tests of statistical 

hypotheses,” Roll/ Soc Phil. Tians., Scr A, Vol 231 (1933), p 289 

[2] A. Waed, “Test of statistical hypotheses concerning several parameters when the 

number of observations is large,” Aw Math Soc. Trans , Vol 64 (19,43), p. 426 

[3] A, Waed, "Statistical decision functions which minimize the maximum risk,” Annals 

of Math , Vol. 46 (1945), p 265. 

[4] A Wald, "An essentially complete class of admissililo decision-functions,” Annals of 

Math. Slat., Yol 18 (1947), p 649, 

[6] ,T, Neyman, "On a statistical problem arising in routine analysis and in sampling in¬ 

spection of mass production,” Annals of Math Slat , Vol, 12 (1941), p 46 
[0] H. ScHEvr^j “On the theory of testing composite hypotheses with one constraint,” 
Annals of Math Slat., Vol. 13 (1942), p 280, 

[7] E, Lehmann, “On optimum tests of composite hypotheses with one constraint,” Armais 

of Math Stal, Yol 18 (1947), p 473 

[5] E Lehmann and H Schepp^, “On the problem of similar regions,” Proc Nai. Acad 

Sci , Vol 33 (1947), p 382 



516 


E. L. LEHMANN AND «. STEIN 


[0] A. Wald, "On the powri- function of the analy.sis of viii'iiinco test/’ Annals nf Malh. 
xS'iaL, Vol. 13 (1942), p 434. 

[10] P L. IIsu, "On the powci function of the hP-tejit and the T“-tent,” Annals nf Malh, 

HlaL , A'ol 16 (1915), p 27H 

[11] II K. Nandi, "On the average power of test criteria,’’ Sardhyi, Vol. 8 (1916), p 67 

[12] W Fullek, "Note, on regions similar to the sample super,’’ Slat, Res Memous, Vol 2 

(1938), p. 117 

[13] P. L IIsu, "Analysis of variance fiom the powei function standpoint,’’ liiomotnka, 

Vol 32 (1941), p, 62 

[14] S Kolodziecxvk, "On an important class of statiaticul hypotheses," liiomelnka, Vol. 

27 (1935),p. 161. 

[15] P. C Tano, "The power function of the analysis of variance tests with tables and 

illustrations of their use,”5iaf. Ties il/cmotrs, Vol. 2 (1938), p. 126. 

[16] A. Wald, "Foundations of a general theory of seiiuential decision functions,’’ Eco- 

noHielnca, Vol. 15 (1947) p 279. 



SYMBOLIC MATRIX DERIVATIVES 

By Paul vS Dwyeb and M S Macpiiail 
Umversily of Michigan and Queen’s Untuetsiit/ 

Summary. Let X be the raatiix [a;,,,,], t a scalar, and let dX/dt, dt/dX de¬ 
note the matrices [dt/dx^n] respectively, Let 7 = be any 

matrix product involving A”, X' and independent matrices, for example 7 = 
AXBX'C Consider the matrix derivatives dY/dXmn , byjiJdX. Our purpose 
is to devise a systematic method for calculating these derivatives. Thus if 
7 = AX, we find that a7/a.i,„« = , dyp^/dX = A'Kpn, where is a 

matrix of tlie same dimensions as A’’, with all elements zero except for a unit in 
the m-th row and n-tli column, and fCp, is similarly defined with respect to 7, 
We considei also the deiivativTs of sums, differences, powers, the inverse matrix 
and the function of a function, thus setting up a matrix analogue of elementary 
differential calculus This is designed for application to statistics, and gives a 
concise and suggestive method for treating such topics as multiple regression 
and canonical correlation 


1. Introduction. The derivative of a matrix with respect to a scalar 


( 1 ) 


37 _ 3 r , _ 


dx 


IS well known and commonly used. The symbolic derivative obtained by apply¬ 
ing a matrix of differential operators to a scalar 


( 2 ) 


^ 2 / _ ^ y — 

3 A _3llmn_ 


dy 

_ dXmjj 


IS not m such general use though some authors give special cases For example, 
if A is a symmetric matrix and X a column matrix, so that y = X'AX is a quad¬ 
ratic form, Fraser, Duncan and Collar [1, p 48] write 


( 3 ) 


d/dxi 
e/dXi 

L J 


y = 2AX 


to indicate concisely the result of differentiating y with respect to the elements 
fC of ^ 

It 18 to be noted that the matrix m (1) has the same dimensions (numbers 
of rows and columns) as the matrix 7, while the matrix in (2) has the dimensions 

of the matrix X. 


517 



518 


mUL S. DWyRU and AI, H. MVCI'HAIL 


We present an illuHlralaon of eaeli of these types of symbolic matrix derivatives 
in Older to clarify the concepts. Thus if 


ive have 



T 


3.*-'' 

Y = 

jY 

.sin * 

log. xj, 

dY _ 

"1 

6.*' 

-12.*"‘ 

9* 

X 

J 

cos X 


.*31*12 and 





*11 *12 


X = 

*21 .*22 




_ *31 *32 



we have 


dij 

fx 


0 

— Il2 


-X31 

0 

Xn 


Suppo,se Y is any matrix product involving X, X' and independent matrice.s, 
for example, F = AXBX'C. Wc may fix an element Xm„ of X and form the 
matrix 


(4) 


dY 

dXmn 


or we may fix an element of Y and form the matrix 

(5) 


^Vvv 

dX‘ 


The purpose of this paper is to devise a systematic method for calculating these 
matrices, and to give various applications in the general field of statistics. 

By way of introduction we take the matrix product Y = AX where 


A = 





'.*11 

.*12 ~ 

On O 12 

Ol3 

and X = 

*21 

*22 

_021 022 

023_ 


_*31 

.*32. 


so that 


Y = 


flu Xii + 042 *21 + flis *31 On *12 + Ol2 *22 + Ol8 *42 

1_021 ajll + 022 *21 + 023 *31 Osi *12 + O22 *22 + 023 *32 J • 



We have then 


519 


SYMBOLIC MATRIX DERIVATIVEa 


dY 

an 

0“ 

dY _ 

"0 

an~ 

0'J'U 

_a2i 

0_ 

j Ssiia 

_0 

“21J 

dY 

ai2 

0‘ 

dY _ 

’0 

ai2 

dXii 

_a2j 

0. 

, 9X22 

_0 

(h2_ 

3Y 


0‘ 


'0 

Uis 

9.1.31 

_a2,i 

0_ 

, dX32 

_0 

(hi_ 


These six equations can be combined in the single one 

( 6 ) ^ 

9a.„„ 

where is a matrix having dimensions of X, with all elements zero except for 
a unit element in the m-th row and n-th column. Similarly we find 



Uii 

0" 


"0 

On 

(1 

<ll2 

0 

1 

II 

0 

ai2 


_ dii 

0_ 


_0 

013 _ 


<l21 

0" 


"0 

cki 

di/21 

dX 

C2Z 

0 

dijn 
’ ~dX 

0 

022 


_ 023 

0_ 


_0 

023 _ 


These four equations can be combmed in the single one 

(7) 


where Kpg is the matrix having the dimensions of Y with all elements zero except 
for a unit element in the p-th row and g-th column. 

It should be noted that the matrices on the left of (6) and (7) are matrices com¬ 


posed of the basic elements 




Other types of symbolic matrix derivatives could be defined and studied. We 
have selected these two main types because of their application to regression and 
correlation theory. The second type is more specifically indicated in the ap¬ 
plications but the relations between the types are such that a simultaneous treat¬ 
ment seems appropriate. 


2. Notation. Capital letters are used for matrices and small letters for 
scalars. It is understood that Y, U, V, ■ are matrices whose elements are 
functions of the elements Xmn of X and that A, B, • ■ ■ (unless otherwise stated) 



520 


PAUL K, DWYER AND M. .S. MAUPIIAIL 


arc matrices whoso clemoiits arc not functions of In the development of 

the formulas it is understood that the differentiation is earned out with respect 
to x,„n or A’’. The matrix function diffeientiated is called Y. 

We have already defined J „„ as the matrix huvinK the dimensions of X with 
all elements zcio except for a unit element in the m-th row and the n-th column, 
and we define K,,, similarly with respect to T. We now define J'n„ as the matrix 
having the dimensions of X' with all elements zero except for a unit element in 
the n-th row and the m-th column, and we define similarly with respect 

to Y'. All the formulas we obtam for -— involve or Jnm while all those 

for ~~ involve Kpq or Kgp . 
oX 

3. Differentiation of a constant. If 7 = A = [a,,,,] we have at once 


^ypo _ Q 

It follows that 



where the zero matrix of (8) has the dimensions of A , while that of (9) has the 
dimensions of X. 


4. Differentiation of a matrix with respect to itself. If 1" = A = we note 
that 

^ = I ^ (P = 2 = ^^)| 

dXmn [o (otherwisc) J . 

It follows that 


( 10 ) 


^ r 1 — r 
dx„,„ ' 



6. Differentiation of the transpose of a matrix with respect to the matrix. 

Let Y = X', so that 


Then 


Vpq Xqp . 




[l (g = m,p = n); 
[O (otherwise), 



SYMBOLIC MATRIX DERIVATIVES 


321 


and we have 

where J nm , are defined as in section 2. 

6. Differentiation of sums and differences of matrices. If 

we have 


then 


Y 

+ 

II 

V 

- IF = 

(Wp9 “t~ 

VpQ 

UJpql, 





+ 

BVpq 

dWpq 







5^frtn 

> 

dV 

dXjrin 

d 

lypq\ — 

_ Vpg 

'^pq 


_ 5 

[Hpq] + 

^ w - 

d 

dXjmi 

h^pq\ 


= 

4- 

dV 


dW 




d.Tmn 



i 

dXjtjn 




dyp<i 


dlipq 

4- 

dVpq 

dwpg 



dX 


Tx 


dX 

Jx ' 



(14) 

and similarly 

(15) 


7. General formulas for the differentiation of a two factor matrix product. 

Suppose 17 IB a matrix with c rows and d columns and F is a matrix with d rows 
and e columns, then 

(16) Y = UV = [yp,] = 2 Mp. 


We have at once 

(17) 


dy,, 


5.7/fn 


= E 


1 dXm 


+ E 


di) 


-I '’’‘dx„ 


>1 


Now considering any fixed Xmn it is clear that the first term on the right of (17) 

du 

is the same as the right hand term of (16) with —in place of Uj,,. The second 
term on the right of (17) is likewise the same as the right hand terrii of (16) with 
in place of u,. We may then write 

OXmn 

( 18 ) ^.J!Lv+u^. 



522 


PAUL S. DWYBR AND M. S. MACPHAIL 


Also considering a fixed vfc have 



It is to be noted that this fornaula yields matrices of the proper dimensions (those 

of X) since and have the dimensions of X. These matrices, when 

multiplied by the scalar values and Up, and summed, yield matrices of the 
desired dimensions. 


8. Some properties of matrix products involving J’s and K's. Before deriving 
formulas for the differentiation of products of specific factors, it seems wise to 
derive some formulas exhibiting certain relations involving the J’s and K’s. 
Consider the matrix A having c rows and d columns and the matrix X having d 
rows and e columns. Then Y = AX i.s a matrix with c rows and e columns, 
one with d rows and e columns, one with e rows and d columns, Kp, one with 
c rows and e columns and one with c rows and c columns 
It is easily seen by actual multiplication that 

(20) ac X e matrix loilh all ils elcmenls zero except those of its n-th column 
which arc those of the m-lh column of A. We omit further discussion of the dimen¬ 
sions of the matrices and assume that whenever a matrix product is written, 
the factors are comformablo. Then wo can show similarly that 

(21) JmnB is a matrix with all ils elements zero except those of its m-lh row, which 
are those of the n-th row of B. Similar statements hold if Jmn is replaced by Jnm 
or Xp, or Kgp The rules arc 

(a) When (or J „„ or Xp, or K,p) is the postmultiplier, the first subscript 
indicates the column of the other matrix which is placed in the column 
indicated by the second subscript. 

(b) When ■/„„ (or Jnm or /Cp, or X,p) is the premultiplier, the second subscript 
indicates the row of the other matrix which is placed in the row indicated 
by the first subscript. 

Notice also that 

(22) A'K •pq ts Qi vyitJi dhl zovo GxcCjpt thosG oj its ^~lh coLuTtzTif w/wc/i 

are time of the p4h column of A', or the p-ih row of A. A similar result holds if 
Xpg is replaced by K'„p or or j'„„ . 


9. Differentiation of specific two factor products. Let us start with Y = 
AX where the various matrices involved have the dimensions indicated in the 
last section. Application of (18), (8), (10) gives 



(23) 



SYMBOLIC MATEIX DEBIVATIVES 


523 


while application of (19), (11) yields 


(24) 








’ ax 


’ + £ flj 

- S=A 

^ Xaq 

a-1 

= apJCij + apiKi^ d- ... -|_ QpiKd^ 

= a c X e matrix with aU elements zeio except those of its g-th column 
which are those of the p-th row of A 


= A'Zp, 

Similar treatment of F = XB yields 
(25) 


by (22). 


dXn,n dXr,p 


(26) 


^Vvl 

dX 


= 2 5.3 = E Kp,haa = Kp, B'. 


If we treat Y = AX' in a similar fashion, we get 
(27) 


dY _ , 


(28) 

while Y = X'B yields 

(29) 

(30) 


^ypt _ A 

dX ~ 


dY 

dXjnn 

dX 


— J nmB, 
= BK'. 


It IS to be noted that J always has the subscripts mn, and similarly we find always 
Jnm , Kpq, Kqp . Wc may therefore omit the subscripts on these letters. When 
we do so we shall also write 


dY dX 

a(X> 9a:„„’ 


dX dX ’ 


placing brackets ( ) around the matrix from which a fixed element is to be 
chosen. Thus if F = AX, we write instead of (23) and (24) 


(23a) 


(24a) 


- AJ- 
d(X) • 

- A'K. 

ax 


The other results are summarized in lines 1-5 of Table I 
Examination of (18) and (19) shows that the derivatives of products with 
two variable factors are obtained by adding the results obtained by holding 



524 


PAUL S. DWYEIl AND M 8, MACI'HAID 


each factor constant while clifferentiating the other. With this m mind, (23)-(30) 
can be used to obtain the derivatives of double products involving X and X'. 
Thus if F = XX, we get 

(31) = JX + XJ, = KX' + X'K. 

Other double product formulas involving X and X' arc given in Table I. 


TABLE I 


For- 1 
mula 1 

Y 

ay 

a(X) 

a<y> 

dX 

1 ' 

AB 

0 

0 

2 

AX 

AJ 

A'K 

3 

XB 

JB 

KB' 

4 

AZ' 

AJ' 

K'A 

5 

X'B 

J'B 

BK' 

6 

XX 

JX. “h JlJ 

KX' 4- X'K 

7 

X'X 

J'X + X'J 

XK' + XJC 

8 

XX' 

JX' -b XJ' 

KX + K'X 

9 

X'X' 

J'X' + X'J' 

X'K' + K'X' 


not so easy to write. 


dV . d(y) 

The formulas for arc written down very easily, but those for are 
ov'-) ^X 

BY d(V) 

However the values of - . , and -, (in formulas 2-6 of 
0\A / dJC 

Table I are such that the results for may be obtained from those for 

dX d(X) 

with the use of a few simple rules. They arc 

(a) Each J becomes K and each J' becomes K'. 

(b) The pre (or post) multiplier of J becomes its transpose. 

(o) The pre (or post) multiplier of J' becomes a post (or pre) multiplier of K'. 
These rules arc immediately applicable to the double products. Thus when 
Y = X'X we have 

ay 

= J'X + X'J, 


and so 


a(A) 

a(F) 

dX 


= XK' + XK. 


10. Differentiation of three (or more) factor products. Products with three 
factors can be differentiated by the formulas of the last section if two adjacent 
factois are constant. Thus if F = ABX, we have 


aF 

a(z) 


= ABJ, 


3(y) 

dX 


B'A'K. 



SYMBOLIC MATRIX DERIVATIVES 


525 


It is not yet demonstrated that these rales are applicable to the products AXB 
and AX'B However it can be shown by the general methods indicated earlier 
that if F = AXB, we obtain 


(33) 


a’ = 

while if F 

= AX'B we have 


(34) 

11 

& - BK'A 


It is now apparent that the rules of the last section apply to situations in which 
there are both pre and post multipliers. 

The general theory for two-factor products is immediately extendable. Thus 
if F = UVWVAhy^a = ,tW,j then the basic element is 

a r 


(35) = £ r 

OXmn a r OX^n 




3a:, 


'WrflB Vat 


dWr, 


and the formulas result from treating each factor in turn as the only variable. 
For example if F = XX'X, we have 


(36) 


dY 

d(X) 


JX‘X + XJ'X+ XX’J, 


and 

® = K{X'Xy + XK'X + (XX'YK 
(37) 

= KX'X XK'X + XX'K. 

The symbolic derivatives of certain triple product matrices are presented in 

Table II. , , 

The rules are sufficiently general to take care of matrices with more than 

three factors. Thus if F = A'X'XB, we have 


(38) 
and 

(39) 


= A'J'XB -f A'X'JB 

d{X) 


= XBK'A' -b XAKB', 
dX 


and in the special case 5 = ii, we get 

^ = XA(K' + K)A'. 
dX 


(41) 



526 


PAUL S. DWYER AND M. S. MACPIIAIL 


Similarly if F = X'A'AX, we get 

(42) = ./' A' AF + X' A' AJ, 
and 

(43) = A'AXK' + A'AXK. 

aX 


TABLE II 


For¬ 

mula 

Y 

dY 

S{X) 

HY) 

ax 

1 

ABC 

0 

0 

2 

ABX 

ABJ 

B'A'K 

3 

AXG 

AJC 

A'KC 

4 

XBC 

JBC 

KC'B' 

5 

ABX' 

ABJ' 

K'AB 

6 

AX'C 

AJ'C 

CK'A 

7 

X'BC 

J'BC 

BCK' 

8 

AXX 

AJX -1- 

A'KX' + X'A'K 

9 

XBX 

JBX -f XBJ 

KX'B' -f B'X'K 

10 

XXC 

JXC -h XJC 

KC'X' + X'KC 

11 

AX'X' 

AJ'X' -1- AX'J' 

X'K'A -t- K'AX' 

12 

X'BX' 

J'BX' + X'BJ' 

BX'K' + K'X'B 

13 

X'X'C 

J'X'C -f X'J'G 

X'CK' + CK'X' 

14 

AX'X 

AJ'X -f AX'J 

XK'A + XA'K 

16 

X'BX 

J'BX + X'BJ 

BXK' -f B'XK 

IG 

X'XC 

J'XC + X'JC 

XCK' + XKC 

17 

AXX' 

AJX' + AXJ' 

A'KX -1- K'AX 

18 

XBX' 

JBX' + XBJ' 

KXB' A- K'XB 

19 

XX'C 

JX'C -h XJ'C 

KC'X A- CK'X 

20 

XXX 

JXX + XJX + XXJ 

KX'X' A- X'KX' + X'X'K 

21 

XXX' 

JXX' + XJX' -1- XXJ' 

KXX' A- X'KX + K'XX 

22 

XX'X 

JX'X XJ'X + XX'J 

KX'X -1- XK'X -F XX'K 

23 

X'XX 

J'XX + X'JX + X'XJ 

XXK' -1- XKX' -1- X'XK 

24 

XX'X' 

JX'X' -f- XJ'X' + XX'J' 

KXX + X'K'X A- K'XX' 

26 

X'XX' 

J'XX' -1- X'JX’ + X'XJ' 

XX'K' A- XKX -1- K'X'X 

26 

X'X'X 

J'X'X + X'J'X + X'X'J 

X'XK' A- XK'X' -1- XXK 

27 

X'X'X' 

J'X'X' -1- X'J'X' A X'X'J' 

X'X'K' A- X'K'X' A- K'X'X' 


Finally if F = XAX'AX, we get 

ay 

(44) = J AX'AX + XAJ'AX + XAX'AJ, 
fiY 

(45) = KX'A'XA' + AXK' XA + A’ XA' X'K. 


11. Vector results. It should be emphasized that each of the above results 
is a general result. More specific results may be obtained in case one (or more) 


SYMBOLIC MA.TH1X DERIVATIVES 


527 


of the matrices is a vector. For example if is a column matrix and 
F = XcBXg , then F is a scalar, so K and K' are both unity and we have from 
Table II (15) 


(46) 


'ix^ = ^^'^0 + fi'X. = (B + BOX. 


If m addition B is symmetric, B' = B and we have 


d(F)_ 


3/=2BX. 

which IS the result indicated in (3). 

12. Differentiation of the inverse of X. It is possible to use implicit differen- 
tiation to derive formulas for and ■ ■ We write I = and get 


so that 

(47) 
whence 

(48) 


a<X ^^d(xy 


dX~^ — •y-i ry-i 
b{X)~ ’ 


= -ixryK{x-y. 


The formula (47) is a generalization of a known matrix differential formula 
[3:3,4], 

In a similar way we derive 


(49) 

(50) 


dX 


13. Differentiation of a function of a function. The theory developed in the 
earlier sections is sufl&ciently general to be useful in differentiating a function of 
a function if the functions involve addition, subtraction, premultiplication, post- 
multiplication, and inverse. For example if 

(51) Y = Z'Z with Z = AX 


we have 


{dX) d{X)’ 



628 


PAUL S. DWYEU AND M. S. MAGPIUIL 


and since 

— TA' ftnd —AT 

(52) ,J'A'Z + 7AAJ, 

9 A 

and thence 

(53) = A'ZK' + A'ZK. 

aX 

These results are equivalent to those of (42) and (43). 

14. Differentiation of a power of a square matrix. The values of the sym¬ 
bolic derivatives of with respect to X are given in Tables I and II. It can 
be shown similarly that if nis a positive integer 

(54) f = JY"-' + 2 X‘JX"-‘-' + Y""'./, 

c)(A ) 1-1 

and this can be written as 

(66) = 2 YVY"—\ 

if we adopt the convention that Y" is I. It follows at once that 

(66) = 2 Y" Y(Y0'‘“'"‘ 

oA 1-0 

It is thence possible to derive formulas for the symbolic derivatives of Y~". 
Since Y“"Y'‘ = I, we have 

(57) r + Y-" [2 Y'/Y"-“^ = 0, 
so 

(58) = - Y"" [^2 rJY"-’“'] Y-", 
and 

(59) = -Y-" 1^2 (r)*Y(Y')"~'"'] Y^‘. 

16. Applications. We consider the classical theory of least squares, a matrix 
presentation of which is available in [2]. Suppose that p and xi are measured 
from their means and that p is to be estimated from the n variables xi . Form 
the values of p into a column matrix Y and the values of Xi into an W by n matrix 
X. Introduce the column matrix B of n parameters 6, and define 

(60) 


E = Y - XB. 



SYMBOLIC MATBIX DKRIVATIVES 


529 


Note tliat the matrix E'E la in this case the single element matrix which is the 
sum of the squares of the residuals Following the least squares method we 
minimize this by differentiating with respect to the elements of B. We first 
note that 

(61) E'E = (W - B'X')iY - XB) 

= Y'Y - Y'XB - B'X'Y + B'X'XB 


Then we write down first 

(52) = -Y'XJ - J'X'Y + J'X'XB + B'X'XJ, 

from which we get 

= -X' YK - X'YK' + X’XBK' + X'XBK 


(63) 


dB 


= ~X'(Y - XB){K + K') = -X'E{K + K') 

The J’s and K’s. arc associated with B and E'E respectively. Here E'E is scalar 
so that X = JC' = 1 and we have 


(64) 


= -2X'E. 
dB 


The equation X'E = 0, obtained by equating the right hand side of (64) to zero, 
is a statement of the normal equations in matrix form. 

Equation (64) may also be obtained with the use of the methods of section 

13. In this case 


dE 

m 


= -XJ, 


dW 

d{B) 


-J'X', 


and we have 


(65) 
so 

( 66 ) 


d{E'E) _ dE' _j> X'E - E'XJ: 

- d{B} ^ ^ W) 

= ~X'EK' - X'EK = -X'E{K' + K). 
dB 


The equation (64) is also applicable to the more general problem in which 
w, and V, are estimated from the same set of variables x.. The only change 
needed is to regard Y, B, E as two-column matrices so that E'E is a matrix with 
two rows and columns which we denote by 


^11 «12 
_€21 «S2. 



530 


PAUL S. DWYER AND M. S. MACPIIAIL 


Wc require = 0 and = 0. From equation (03), inserting subscripts, 
wo get 


= -X'BiKn + /Ml) 


== -2X>EKn i 


9tj2 

IS 


-2X'EKn, 


It is easily seen that ~ ~ ^ equivalent to X'E = 0, the same equation 

as we obtained in the last paragi'aph. We also arrive at the incidental result that 
in minimizing SeJ, and Sea separately rve find at the same time a stationary 
value of Seica. 

In this way we can treat two or more simultaneous regression problems with 
this general notation as easily as we can treat one. 

As a second application of the theory we outhne the initial steps in the direc¬ 
tion of the formulas for canonical correlation [4J, [5]. In this case A and B are 
unknovm column vectors vdth X and Y known rectangular matrices. Then 
XA is a column matrix: 



whose elements 2, may be regarded as observed values of a linear form 1. Simi¬ 
larly YB = A, a column matrix whose elements may be replaced as obseiwed 
values of a linear form Iv. It is desired to find A and B such that 2 and \ may have 
the largest correlation coefficient, and to find the size of this coefficient. Then 
A'X'XA, B'Y'YB, and B'Y'XA = A'X'YB are scalars, and 


(67) 


_ B'Y'XA 

~ ■\/{A'X'XA){B'rYB )' 


If the scales of X and F are chosen so that A'X'XA = 1 and B'Y'YB = 1, we 
have 


(68) p = B'Y'XA = A'X'YB. 

Using Lagrange multipliers we set 

(69) 4> = B' Y'XA -f- 5 (1 - A'X'XA) ^ (1 - -B' '^B), 

A z 

and differentiate with respect to the elements of A and B. We first differentiate 
</) with respect to A after replacing B'Y'XA by A'X'YB: 



SYMBOLIC MATUrX DERIVATIVES 


531 


<™> 8{I> = - 2 (■'' + A' y XJ ); 

(71) = A- FBAT' - ” (r yiy + X'XAK) 

(The J’s and K'^ are associated with A and 4 , respectively) Wc set ^ = 0 
with K = K' = I to get 

(72) X'YB = eX'XA, 
whence by (57) 

(73) p = A'X'YB = cA'X'XA = c, 
and 

(74) X'YB = pX'XA. 

Similar differentiation with respect to B gives p = d and 

(75) Y'XA = pY'YB. 

The further steps in the development of canonical correlation theory are based 
on (74) and (75). 

A third application is to orthogonal regression. The situation is veiy similar 
to that of the first illustration, but the errors are measured orthogonal to the 
plane of best fit. As before we take the variates as measured from their means 
and so have the basic equation 

^ ~ VW+bl + ----+H ■ 

This can be written as 

(77) D = kxi + I 2 X 2 + • • • + lai, = XL with L'L = 1. 

It follows that the quantity to be minimized is 

(78) D'D = L'X'XL. 

With the use of Lagrange multipliers we have 


so that 


from which 


$ = L'X'XL + X(1 - L'L) 


4^, = J'X'XL + L'X'XJ - WL + L'J), 
d{L) 

^ = X'XLK' + X'XLK - \{LK' + LK) 
BL 


‘IX'XL - 2XL = 0 



532 


PAUL S. DWYER AND M. S. MACPHAIL 


aud the values can be determined from the equation 

(83) {X'X - \)L =. 0. 

The solution continues with the use of the characteristic equation. 

It is to be noted from (79) and (82) that 

D'D = L'X'XL = \L'L = \ 

so that (83) becomes 

(84) (X'X - I)‘D)L - 0. 

A fourtli illustration uses symbolic derivatives in obtaining the principal com¬ 
ponents of a total variance [5,252] The variable portion of the exponent of the 
multivariate normal can be written Y'AY where Y is the column vector 
[ 2 / 1 . ■' ■ ) 2/r] and A is a I by k matrix. We set this equal to a constant, say C, 
and get the equation of the k dimensional ellipsoid It is desired to locate the 
extrema of this ellipsoid. To do this we find the extrema of Y'Y. Using the 
Lagrange multiplier ii'o have 

(85) .p = Y'Y+ \(C ~ Y'AY) 
so that 

(80) = J' y f Y'J - \(J'AY + Y'AJ), 

(87) == YK' + YK - \(AYK' + AYK), 
so that there results 

(88) 7-XAr = 0. 

Pre-multiplying by A~^ we got 

(89) (A“' - \)F = 0 

and pre-multiplying by Y' gives the important relation 

(90) Y'Y = XC. 

A fifth illustration utilizes symbolic differentiation in developing the theory 
of the linear discriminant function [6,341] [8,124], As in the other illustrations, 
the variates are measured about their means. The unknown multipliers are 
indicated by the vector L. Then 

(91) Z ^ XL 
is the general matrix equation while 

(92) = XjL 

Z, = X,L 



SYMBOLIC MATRIX DERIVATIVES 


533 


aie the corresponding equations for the Wo groups. Then 

(93) 7jx = XiL, Zi = XiL, and Zi - = {Xi - Xi)L = DL, 

(gj, - 2. = (Z, - 20i = F.i, 

Z2~Z, = (Xi - Xi)L = YiL 

The within group variation. L'Y[Y,L + UY'^Y^L, is then divided into the 
between group variation, L'D'DL, to get 


(95) 


G = 


L'D'DL 


L'Y'iY[L + L'YiYiL 


A 

B‘ 


We wish to maximize G. 
(96) 


Since A and B are scalars — 

dL 

^ ^ 1 b{A) 
dL Q dL 


which becomes, with further differentiation 


(97) 


(Y[Y,+ Y'iYi)L = D'(^). 

\ (j / 


DL 


Since IS a scalar, we have 


(98) 


(F( 7i + Y'i Yi)L = cD. 


0 reduces to 


Any convenient value of c can be used for purposes of discrimination. It is 
customary to take c = 1 and then to adjust (98) so that some I, is unity. 

A final illustration applies symbolic matiix differentiation to a theorem of 
multiple factor analysis. This presentation parallels that given by Thurstone 
[7,473-477] for transforming any factorial matrix mto a principal axes matrix. 
The matrix. 


(99) F = [o.,] 
has p rows and r colunms, r < p, such that 

(100) FF' = R 


where B is a p X p correlation matrix. 

It is desired to apply the unitary orthogonal transformation L to jF in such a 
way as to produce a matrix, called Fp, which has the sums of the squares m 
respective columns a maximum. This can be done by maximizing simultane¬ 
ously the diagonal terms of FpFp where 

(101) Fp = FL. 

Again using Lagrange multipliers, we have 

(102) 4, = L'F'FL + HI - L'L). 



534 


PAUL S, DWYER AND M. S. MACPHAIL 


This equation has the same analytical foi-m as (79). Diffei'entiation leads to 
the result 

(103) (J'F - \)L = 0. 

The solution of (103) gives the value L which can be substituted in (101) to 
obtain Fp . 

14. Conclusion. Two types of symbolic matrix derivatives have been de¬ 
fined. Laws have been developed for the basic operations of addition, sub¬ 
traction, multiplication, inverse, and powers. Laws for more extended func¬ 
tions can be worked out on the basis of principles enunciated. 

Applications are given to certain multivariate problems. It is our thesis that 
with these differentiation formulas available, much work in multivariate analysis 
can be carried on with a simple matrix notation. 

REFERENCES 

[II 11, A FnAznn, W. J Duncan, and A R. Codlab, Elcmenlary Matrices and Some Appli¬ 
cations la Dynamics and Differential Equations, Cambridge University Press, 
1930. 

[2| 1“ S. Dr'yeii, “A matriv presentation of least squares and correlation theory,” Annals 
of Math. Slat., Vol, 16 (1044), pp. 82-S9. 

[3| A. IJ. Micfiai., Matrix and Tensor Calculus, New York, Jolm Wiley and Sons, 1947 
[41 H. IIoTBLLiNO, “Relations between two sots of variates,” Ihomelnka, Vol. 28 (1930) 
pp. .321-377, 

[5| S S. WiDKfi, Mathematical Slalislica, Princeton University Press, 1940. 

[61 M 0. Kendai.d, Advanced Theory of Slalislict, Vol, 11, London, Charles Griffin and Co, 
Ltd , 1946 

[7] L. L. Thorstone, Multiple Factor Analysis, Chicago, University of Chicago Press, 1947, 
(8l P, G. Hoed, Introduction to Mathematical Statistics, New York, John Wiley and Sons, 
1947. 



ON THE LIMITING DISTRIBUTIONS OF ESTIMATES BASED ON 
SAMPLES FROM FINITE UNIVERSES' 

By William G. Madow 
Instituie of Stalisiics, University of North Carolina 

1. Summary. The paper shows that under very broad conditions the usual 
theorems concerning the limiting distributions of estimates hold for estimates 
based on samples selected from finite universes, at random without replacement. 
It may be remarked that under the same conditions, the same conclusions are 
true for random sampling from finite universes with leplacement, if the universes 
are permitted to change ivithin the limitations set by condition W. 

2. Introduction. It has long been known that the limiting distribution of 
arithmetic means of samples selected at random with replacement from finite 
universes, or from infinite universes is normal under very general conditions 
When, however, a sample is selected from a finite universe without replacement, 
and the size of the sample as compared with that of the universe is too large, for 
the universe to bo treated as infinite, the proof that the limiting distribution. of 
the mean is normal appears to have been given only for the case where the uni¬ 
verse is multinomial ^ In this paper we prove that the limiting distribution of 
the moan is normal provided only that as the universe increases in size, the higher 
moments do not increase too rapidly as compared with the variance, and that 
for sufficiently large sizos of sample and population the ratio of size of sample to 
size of universe is bounded away from 1. Various extensions are given, but these 
are almost immediate consequences of the theorem on the limiting distribution 
of the mean. 

The method used is that of showing that the moments Of the standardized mean 
tend to those of the normal distribution. In doing this wc generalize a theorem 
of Wald and Wolfoivitz,® by making it applicable to permutations of samples 
from finite populations, and by reducing a little the conditions on the coefficients 
The theorem on the mean is then a simple corollary. 

We also note that with these proofs on limiting distributions we can make the 
corresponding assertions concerning characteristic functions. Although no 
applications of this fact are given, it seems likely that some useful results could 
be obtained. ' 

3. Preliminary lemmas. In calculating the fc-th moments and their limits we 

* ProBcnted to the American Mathematical Society at a meeting held in New York City 
on April 17,1948. 

’ See P. N. David, "Limiting disbiibutiouB connected with certain methods of sampling 
human populations,” Stal Res, Mem , Vol. 2 (1938), pp. 69-90, especially p. 77. 

’ A Wald and J. Wolfowitz, "Statistical tests based on permutations of the observa¬ 
tions,” Annals of Math. Siai., Vol. 5 (1944), pp 358-372, especially p 359. 

535 



530 


WIMJAM G. MADOAV 


Hhall u>s(i an infrequently given fni'in of Hie multinomial expansion and some 
properties of Kymmetrie polynomials. In this .section rve make the necessary 
definitions, and pre.sent four lemmas emhodymg the result,s we shall use.'* 

A t-partition of a positii'e integer /. consi.sta of t positive integers , • • ■ ^ at 
such that ai + ' • • H- == Two /-partition.s ai , ■ • , at and /3i, • , /3, of 

k will be said to lie dustinet if for at least one r'aluc of h we have a* 9 ^ /3/, 

Let (p{ai, • ■ • , a,), written ip(a), be any function of the t-partitions of Ic. By 
2;u¥’(a) we shall mean the summation of v(«i , • • , «<) over all distinct t-parti- 
tions of k, 

By Ss(¥>(a) we shall mean the summation of ip(a) over all distinct permutations 
of ai, • • • , 

By SjiipCa) we shall mean the summation of (pia) over all distinct t partitions 
of k satisfying the condition qh > aj > • • • > at . 

Let ^(ri, ■ ■ ■ , r,) be any function of the variables V) , ■ • , vi . Then by 
, ■ • • , r,) we shall mean the summation of ffc), • • • , vi) over all possible 
selections of i integers from 1 to n arranged so that vi > Vi > ■ • ■ > vi . 

The formula for the multinomial given below is not presented as a new result 
It is given only as a means of referring to the result we need. 

Lrmma 1, Let ' , Li ke amj qiiantihes or random variables and lei k be a 
positive inteffer. Then 

(?i + '' ■ + = 52 -11 f'ti.-n, S<,i t"* ' • • f“‘» 

where 

r* - k- 

v oi-.-ai — 1~ , • 

ai! • • • at'. 

The proof is omitted. 

The following lemma will be useful in connection with several of the results of 
this section: 

Lemma 2. If ip{a) is a function of the L-parlUions of k, then 

2ii¥>(Q!) = XjiSiiipia). 

The verbalization of the lemma is practically its proof. 

Let us now define certain symmetric polynomials that we shall use 
Let Avhere the «’s are positive integers and the sum¬ 

mation extends over all possible arrangement.s ri, ■ • ■ , r* of ^ of the integers 
1, • ■ , N. Hence there will be Af**’ = rV'(A' — 1) • • ■ (A — / + 1) terms in 

Q 

Or I, ■ • , or j . 

Lemma 3 Suppose that , • • • , 4 arc an h partition of t, that 

0 !q-|-. • Q!ii+ . +1, , (f = 1, ' ■ ■ , /l, 4 ~ 0)1 

* The order of sections 3 and 4 is largely a matter of taste; some may prefer to treat seo- 
tion 3 as an appendix to section 4 to be referred to when necessary. 



LIMITING distributions OF ESTIMATES 


53 ? 


and that 


ai 5^ a,j+i • • • Fi ai^+ . 

Then, defining 

(3.1) *5:^ = s,.,sa,7 • • ■ C, , 

it folloios that 

^ai, .,a/ “ t]\ ‘ * ih\Sai,> ,ut * 

To prove iMimma 3, it is only necessary to note that each term of S'„,, ,„i will 
determine if. ■ if equal terms of 

Although the moments that we shall obtain will be functions of Su,, .ni , the 
condition that ive shall use on the moments can be interpreted directly only in 
terms of S, . Consequently, in order to be able to analyze the implications of 
that condition on , we state the following lemma: 

Lemma 4. The symmeinc polyncmial Sa,.- ,a, is equal to a sum of products of 
thejonn 

where ti , • • ■ , 7// are an h-parliiion of k, h <t, and each yisa sum of one or more 
of the a’s. Furthermore, if Si = 0, then h < [/c/2] where [/c/2] = k/2 if k is even 
and [k/2] = (/c — l)/2 if k is odd. This follows from the result 

(3.2) Sat' Sa,. ' .“1-1 “ “i" <S»,+a, ,02,. ■ "i" ’ ‘ .«(-!. “1-1+“/ ‘ 

Proof: It is easy to prove (3.2) by comparing terms. Then the other asser¬ 
tions follow from the repeated use of (3.2) and the resulting fact that each 7 is a 
sum of one or more of the a’s. 


4. The limiting distribution. In this section we obtain the generalization of 
the theorem of Wald and Wolfowitz to which reference was made above. 

Let 17i, C/2, • • , C/jf, • be a sequence of imiverses, the universe Un con¬ 
taining the elements® x,n and let the arithmetic mean of the elements of Uj, be 
denoted by -cn ■ Furthermore, let 

UrN — Mi(C/iv) = ^ ~ 


Let Cl, Ca, ■ • • , C„ , ■ ■ • be a sequence of sets of coefficients, the set C„ con¬ 
taining the elements and let the arithmetic mean of the elements of Cn be 
denoted by 6n. We exclude the_^ssibility that the elements of any C„ all vanish, 
and hence we can suppose that £ c\j = 1 Furthermore, let 

' The letter v will assume all integral values from 1 to N. The letter r will assume all 
positive integral values. The letter j will assume all integral values from Iton. The Jett 1 
< will assume all integral values from 1 to fc The symbollim will stand " 

N or both, as the case may bo, increase without limit, it being understood that hm n/iV < 1. 



538 


■\VIU4AM Cl. MADOW 



Since ^(cjn — > 0, it follows that, if we clcline = n '(!„ , then Al < i, 

1 

Let n elcmentH bo selected at random without replacement from f/.v and let 
us denote these elements by x',„ , the aulmeripty indicating the order of selection, 
i.e., x'.n is the f-th element of Us- selected for the .sample e.ven though it may be 

Xertf ■ 

The linear function that wo .shall .study Ls 

' I 1 ' 

Zn = Clni’l.V -r ‘ + C„nX„.V , 

i.e., the value of 2 „ is determined by multiplying thei-th element selected for the 
sample by c,n and summing for 7 . Then, since Fx'i,, — , we have 

Ez„ = 

Turthermore, 

sec this wo first note that 

(‘in ^jn n. 2n “ 1, 

and, if i 9 ^ j, 

E(x',n - = “112,V 

From the definition of variance wc have 

n 

tf =5 E(^Zn EZji) = 'j CinCjnE(_Xin ^,v), 

and making the indicated substitutions the result follows from a few simple 
manipulations. 

If we define it,, to be the arithmetic mean of .r(,v, • • ■ , x',,:/, then it follows that 
Vn cyn = 1 and, as is well known, 


Ex,, *= Jc 



Hence, if we can find the limiting distribution of 

7 — EZn 

/jjl ' ■ ■ ■ J 

then the limiting distribution of {x — x)/<ts will be a special case 




LIMITING DISTMBUTIONS OP ESTIMATES 


539 


We shall need to place some sort of limitation on the sequences Un and C if 
Ave are to obtain theorems on limiting distributions of statistics based on them 
The condition F that we shall use is satisfied by a slightly larger class of se¬ 
quences Un and Cn than that of Wald and Wolfowitz because it does not rule out 
the possibility that all the elements of C„ should be equal It should be noted 
however, that for their purposes this extension of the class of sequences satisfying 
Uy and C'fl is vacuous since they required n = N, so that in their case if all the 
elements of were equal, say k/N, we would have Zy=^kxy no matter in what 
order the elements of Uu were selected for the sample. 

Condition W. The sequence Uy and C„ will satisfy the condition W if 

MiAT = p-lif Xr(]V), 


for sufficiently large n and N, where a finite value X exists such that for all r 

sup I X,(JV) I < X, 
sup 1 Xr(n) 1 < X, 

and« > 0. 


(Note that if F is satisfied for all even values of r then W is also satisfied for all 
odd values of r since Hr+s/ir > Mr+i). 

A general theorem on moments is the following: 

Theoeem 1. Let Sa^,• and Sai,. a, he defined in terms of xy — xy 
instead of f ^ and let T^t, ,a, he the same function of the c,v that Sl,, u of the 
x,y — Xy . Furthermore, let Ek = EZ\ . Then 

(4.1), E, = ZSai C*, . 


Pboop: From the definition of Zn and Lemma 1, it follows that 

<rlE, = 'ZLuCt, . j:4nC“‘„,..C,7nF(.<;, - 3;.)“' --(^.V 

( 


x„)“' 


Since we are selecting at random without replacement it follows that 
N^‘^E{xt,y ~ ••• (xt,y - XyP = S., . 

If WO now use Lemma 2 to replace Su by 28(22(, we then obtain 


Ek = Ssf C'a,...a8 ..a, '^il Sin 


'•'vin i 


since both Ctj, and jSa,, .,a, are invariant under permutations of m , • • • , 
cti. Then from (3 1 ) and the definition of TL,, ..a,, it follows that (4 1 ) is 
proved. 



540 


■\VILLUM Q. MADOW 


Our fundamental tlicorem is: 

Theoeem 2. If the nrqiimcca Un and On aaliufy the condition W, then 

Iim Ei,+i = 0, 


and 


so that, for any a, 


lim Ei, 


(2j)! 

2 ’*/! ’ 


lim P\Zn < al 


-v^L 


dx, 


Proof: We wish to show that lim Ek exists and has the values given above 
First consider the parts of the typical term of Bk that depend on n and N, i e., 
the expression 


<J 

B = 


t' 

* ttl. 


(1 - nAl/N) 


m 


Since lim i?* will be the sum of the limits of a finite number of these terms, let 
us first determine under what conditions B will tend to zero as n and N become 
infinite. 

From Lemma 4 it follows that 


-S' 






ih > 


where 7 i 4- • ■ ■ + 7 a = ai + ■ ■ ■ + «- and each of the p’s is the sum of one or more 
of the a’s. From the definition of tiai,- ..a, in terms of a; — Xir it follows that 
Si = 0. Hence the minimum value of all p’s in any non-vamshing term of the 
summation is 2. Consequently we can say that for all non-vanishing terms h < 
[k/2] and h < t. Finally if condition W i.s satisfied then 


where 


Similarly 


S,,--- Sy, = 

sup I I < 

2 ± Ty^,. .,Ty^ , 


where it may be that Pi 0 so that we cannot require g < [/c/2] for the term 
2^71 ■ ' ’P’te to be non-vanishing. We still have, however, from Lemma 4 that 
g < i- 

If condition Tf is satisfied, then 

Ty.--- Ty, = n<'-'‘X(n), 



LIMITING DISTRIBUTIONS OF ESTIMATES 


541 


where 

sup I x;(ft) I < y. 

Hence, from Lemma 4, the definitions of and n',n and condition W it follows 
that B IS a sum (the number of terms does not depend on n or N) of terms like 

j, _X(j\r)x'f7i) 

where 

h < [fc/2], h <t, g <t, 
and 

sup I \iN) 1 < 00 , 
and sup 1 X'(n) | < oo, 

Since h < t, it follows that if < k/2 then lim H = 0 Hence, a possibly non¬ 
vanishing term must have g > lc/2 and hence i > k/2 because t>g. Further¬ 
more, i> g + h ~ k/2, since h - fc/2 < 0 and i ^ jr Hence t - h > g - k/2 
Now, we can write 

^ ^ m n), 

where 

sup 1 X(JV, n) I < oo, 

since < 1 — « for suflficiently large n and N". 

Hence 

hm f) = 0 , 

unless, perhaps, when g — k/2 = i — h,ie.,h — k/2 = t - g. Smce h - k/2 < 
0 and t ~ g > 0, it follows that we must have h = k/2 and t = g for lim D to he 
possibly not zero. 

If k is odd, then h < (k — l)/2 and hence 

hm Ei,+i = 0 , 

since all terms obtained by expanding it as above will tend to zero 
If k is even, say k = 2j, and lim D is possibly non-vanishing, then h must equal 
j and we must have 71 = • * • = yy = 2. Consequently, from Lemma 4, the 
only possibly non-vanishing terms of En are those arising from the polynomials 
Sa,,.. ,a,, rli,.■.,£<, with ai = ■ ■ • = a, = 2 , and a,+i = ■ = a; = 1 , so that 

2s + t - s = 2j or t - 2j - s, s = 0,1, ■ ■ ,]. For such values of ai, • • • , d!< 
we have 

2« ■ 


& 

Lni, • ,«! 



542 


■WILLIAM G. MADOW 


Furthermore, as shou'ii beloir, in rleveloping by means of Lemma 4 the 

coefficient of iS; is 


(4,2) 


/ .y-» (2j 2s)! 

^ ’ 2»-(j “ s) ■ 


Demonstka-Tiok of (4.2) : If s « then it follows from Lemma 4 that the 
coefficient of Hi i.s 1. If s < /. we use Iximma 4, and noting that 6'i = 0, we 
obtain 


where, since ai => 1 , wo have, on + *» == as + ai = • • • = 1, Oj + aj = 3, and 
a.Hi + ai = • • ■ = tti-i + «( = 2. Consequently of the < ~ 1 terms of the 
above evaluation of , exactly s will have a’s > 2 and t — 1 will be 

of the same foi-m as except that instead of a of the a’s being 2 we have 

s + 1 of the a's equal 2. For each such s we repeat the process obtaining 

H.,, = (-l)“--‘'^’(f - ,s - !)(/ - 5 - 3) •. ■ :M . Ss,, ..s 

7 

+ terms which have h < j. 

Consequently (4.2) provides the coefficient of iS's in *S'„,,...,ai. Since the other 
terms of Sa,,.. have h < j, they lead to terms of Bi, that vanish in the limit. 
Furthermore, by Lemma 3, To,, ...o, “ T'a,,.. ,„,s'(L — s) I and the only terra 
of ITai, •■.<'1 which g ~ lis 

J 2 i 1 =71 hI rt ► 


The other terms of r*, -will Uiad to terms of B^ that vanish in the limit since 
g <i. Consequently, eliminating terms known to tend to zero as n and B be- 
come infinite, we see that — /(ra, N) tends to zero as n and N become infinite, 
where 


Kn, N) 


f M ( - 1)'-* _ 2s)! 


Now as n and N become infinite -with n < N, we see that 

lim/(n, N) = lim t (-1)-* w™-c-, (nAViV)"-V(l - nAl/NY 
2 ' ,„o s'\g — 5)! 


20'1 ’ 
i.e.. 


lim Eij 


21 -jV 


To complete the proof it is only necessary to note that the normal distribution is 
completely determined by its moments.® 


' See for example, M. G. Kendall, The Advanced Theory of Sialistica, Vol. I, London, 
Charles Griffin and Company, page 110. 




LIMITING DISTEIBBtIONS OF ESTIMATES 


543 


Since Theorem 2 is a generalization of the Theorem of Wald and Wolfoivitz 
it IS possible to generalize slightly all ttie applications they make of their theorem! 
The statements of those generalizations are omitted 
The applic_ation of Theorem 2 that led to this paper is the following: Suppose 
that Cjii n • Then the secjiience On. satisfies W and/t^ = 1. Consequently 
we have proved 

CoEOLLAEY 1 , If the sequence Vn satisfies the condition W and if is the anih- 
nieiic mean of a sample of n elements selected at random without replacement from 
JJn , then, for all a, 


lim P 


n'-I^Xn — Xff) 


— m/n) 


\ <. a 




-xijl 


dx. 


provided that « > 0 exists such thatn/N <1 — e,ifn andN are sufficiently large. 

Now the sequence of will certainly satisfy W if U^, has the same moments 
for all values of N, or if the moments of Ui, tend to fixed values as N increases, 
or if the universe Uif is a random sample of a universe having these properties. 
Consequently Theorem 1 and its corollaries will be valid for many applications, 
among them being the case studied by F. N. David’ when Ui; has the same multi¬ 
nomial distribution for each value of W. 

The condition W is immediately satisfied for large classes of changing uni¬ 
verses. For example, if the elements of all Ui/ are uniformly bounded and 


lim tus 0, 


then the condition W is satisfied. As an illustration, consider the case where 
JJn contains Nps elements having the value one and JV(1 — pif) elements having 
the value zero. Then 


and 


PIN = Pe(1 ■" Vn)> 


1 A, 

prN = ^ £ (1 — vify + £ {~PNy, 

iV r«.l r=:Npj^+l 

= — PnY + (—17(1 — Pn)Vn- 


Hence 

fu \r/2 tI2 

PrN _ (1 — Plf) _L C — lV „ 

r/2 ~ r/2~l yll-l i 

Piy Pn (1 - Pn) 

so that condition W will be satisfied if « > 0 exists such that« < < 1 — e 

for all sufficiently large W. 

Hence the limiting distribution of Za 'will be normal no matter how the propor¬ 
tions Pn change provided only that the universe Un does not come to consist 
essentially only of zeros or only of ones. 


’’ Op. oit. 



M'lUilAM Ci. MADOtt' 


5-1-t 


Vanoiifj inultivanato PxU-nf'iuii.s of Theorem 2 are immediate. For example; 
Tiikorem 3. Suppose lhai the rlemrnls of f'.v arc vectors of two componmls* 
(.i;v,vi, a-,.vs), and that the eoiuhtion IF is salisfied bp the sequences C„, Uni, and 
IJm where Um ,h= 1 , 2 , ronlains the dements x ,.va . 

Lei 

^nh ^ jith , 

i 

and let 

V dt/Znb 

=-- - , 

^nk 

where Ike random variables x',„h are defined as were . 

Lei 

_ ' ^Ani)(;rr.Y2 ^.vs) 

(pim‘’ 

and suppose that lim pv exists and is equal to p where p > — 1 + e, Then, the 
limiting distnbution of Zni and Z„i is Uvariate. normal with means 0, variances 1, 
and correlation coefficient p. 

Proof: To prove Theorem 3 wo shall show that any linear function 
UZni + fa^rts will be normally distributed in the limit if h and h are not both 
zero. It wll then follow” that the theorem is true. 

If we define to be the .sequence whose elements are 

. kixtfn — £bn) kix,,yi — xvs) 

— ‘T/s + 1/2 , 

PiSl Pl\l 

then the arithmetic mean of l7v is zero. Let 

t 

and let 


t = 


Zn Ez„ 
0‘in 


Then, it is readily verified that 


(l^Jnl + kZn 


A H/Jnl T i2/:,«2 
"n = 


* The generftliKation holds for any finite number of components but, to simplfy the dis¬ 
cussion, is stated for two componentB only. The method used is due to H, Crarndr, Eardom 
Vanables and Probability Disiribulions, Cambridge Umycraity Press, London, 1937, p. 106 
’H Cranidr, ilJandom Variables and Probabihly DislHbuHons, Cambridge University 
Press, London, 1937, p 106 



LIMITING DISTRIDUTIONS OF ESTBUTES 


545 


Consequently, to prove thnt has a normal limiting distribution we 

need to verify that the sequence Vt, satisfies the condition W if U^i and Um do 
The moments of are * 

A _ 1 Y' -r 
JV If j 

so that 

M2W = ^1 + + 2lllipy , 

where pjv has the usual form of the correlation coefficient. Furthermore, using 
the binomial expansion, we have 


(4,4) 

where 




= Lc: 


^2 }^a,r~a}i 

fliN2 


1 

Ma,r-tiA' — 2-1 i^rNl ~ £ot) ~ XmY " 


Then, by the Cauchy-SchAvarz inequality we have 

12 (•a'i'M ~ ~ Xf/tY “"I 

V 

< [E {Xm - x,r • E 

V y 

so that 

I , 1^1/2 l/S 

m M2<»,Wlll2r-2a.JY2 , 

and using condition W for Uni and Um, Ave have 

Mia.m < ^i2NlHN'), /t2r-}alf2 ^ MM2^(N). 

Hence, substituting in (44) avc see that 

sup I flrw I < oo. 

Hence the sequence Un satisfies the condition IF for all U and 4 , and Theorem 
3 is proved. 

From Theorem 3, it then follows that the theorems on the limiting distribu¬ 
tions of moments, product moments and functions of moments^" are valid for 
sampling from finite universes, at random Avithoiit replacement, 

The moat important of theao theorems are given in H Cramdr, MaihemaHcd Methods 
of Statistics, Princeton University Press, Princeton, 1940, sections 28 2-28 4, pp. 364-367 



A NON-PARAMETRIC TEST OF INDEPENDENCE* 

By Wassily IIoeypdinq 
Insliluie of Stalislics, Univrrsiiy of North Carolina 

1, Summary. A tost is propcwcd for the indcpenilonco of two random valuables 
u'ith continuous distribution function (d.f.). The. tc.sb is consistent with respect 
to the class £ 2 " of d.f.’s with continuous joint anti marginal probability densities 
(p.d.). The test statistic D depends only on the rank order of the observations. 
The mean and variance of D are given and V iiU) — ED) is shown to have a 
normal limiting distribution for any parent distribution. In the case of indc' 
pendence this limiting distribution is degenerate, and nD has a non-normal 
limiting distribution whose charactenatic function and cumulants are given, 
The exact distribution of D in the case of independence for samples of size 
n = 5, 0, 7 is tabulated. In the Appendi.x it is shown that there do not exist 
tests of independence based on ranks which are unbiased on any significance 
level with respect to the class £1". It is also shown that if the parent distribution 
belongs to £ 2 " and for some n > 5 the probabilities of the n! rank permutations 
are equal, the random variables arc independent, 

2. Introduction. In a non-parametric test of a statistical hypothesis we do 
not make any assumptions about the functional form of the population distribu¬ 
tion, A general theory of non-paramctric tests is not yet developed, and a 
satisfactory definition of "boat” non-parametric testa does not seem to be avail¬ 
able Desirable properties of a "good” non-parametric test are unbiasedness and 
consistency, A test of a hypothesis No is said to be consistent with respect to a 
specified class of admissible hypotheses if the probability of accepting No tends 
to zero with increasing sample size whenever a hypothesis 5 ^ No of this class 
is true. 

In this paper we consider the problem of testing the independence of two 
random variables X, Y on the basis of a random sample of size n. In all that 
follows the d.f. F{x, y) of {X, 7) is assumed to bo continuous. We will denote 
by £ 2 ' the class of continuous d.f.’s F{x, y) and by £ 2 ” the class of d.f.’s having 
continuous joint and marginal p.d.’s, 

f(h y) = y)/dx dy, fi(x) « Jf(x, y) dy, U{y) == Jf(x, y) dx. 

The hypothesis No to be tested is that F{x, y) is of the form 
F{x,y) = F{x, oo) 2 r(a>, jy). 

Several tests of this h 3 T)othesi 3 have been proposed. Among them those 
deserve particular attention which depend only on the rank order of the obser- 

* Research, under a contract with the Office of Naval Research for development of multi¬ 
variate statistical theory. 


646 



A NON-PARAMBTRIC TEST 


547 


nations. They will be referred to as rank teats The critical region of a rank 
test of independence with respect to the class O' is similar to the sample space, 
the rank tests share this property with other tests obtained by the method of 
randomization (of. Scheff6 [1]). A characteristic feature of a rank test is that it 
remains invariant under order preserving transformations of X or Y. 

Bank tests of independence have been studied by Hotelling and Pabst [2], 
Kendall [3] and Wolfowitz 14] While nothing is yet known about the power of 
the last test, the author [5] has shown that the two former tests arc asymptotically 
biased for certain alternatives belonging to fi' By a slight modification of the 
examples given in [5] it can be shown that these tests are asymptotically biased 
even with respect to the class n". 

In the Appendix it is shown that there do not exist ranlc tests of independence 
which are unbiased on any level of significance with respect to the classes fi' 
or n" It will appear from this paper that there do exist ranlc tests of independ¬ 
ence which are consistent, and hence asymptotically unbiased, at least with 

respect to S2 . 


3. The Functional (liven a random sample from a population with a 

d f. belonging to a class fi, wc want to test the hypothesis Ho that B is m a sub¬ 
class w of (1. It is easy to conatiuct a consistent test of Ho if there exist (a) a 
functional d{F) defined for every FinQ and such that e(F) = 0 if and only if 
y « w and (b) a consistent estimate of 6 (,F), There are many ways of devising 
by tins method consistent tests of independence, The particular test described 
in the sequel has been chosen mainly for its relative simplicity. 

If F{X) y) is a bivariate d.f., lob 

D{x, y) = F{x, y) - F{x, ^)F{^, y) 


A = A{F) = fnHx, y) dF{x, y). 


and 

(3.1) 

Here and in the following, when no domain of integration is indicated, the 
(Lebesgue-Stielties) integral is extended over the entire space (here B 2 ). 
TheSndom variables X, Y with the d.f. F{x, y) are independent if and only 

TlSoiM3^1. When Fix,y)ldongs^Q^",m = Ot/andonli/f/H(ir,ij) ^ 0. 
Proof Evidently Dix, j/) - 0 implies A(F) =0. 

Now suppose that D(x, y) ^ 0, Since Fix, y) is in n'. the function d(x, y) 

fix, y) - Mx)My) is continuous. We have 


Dix, v) = [L diu, v) du dv. 


Dix, y) ^ 0 implies dix, y) ^ 0, and since 

JJ dix, y) dx dy = 0, 



548 


\V\S.S1I-Y IIOKFFDIVC: 


thm; exists a leetanpile Q in J{, sueh that dU, //) > 0 if (.r, ij) is in Q, Hence 
D{x, y) 9 ^ 0 almost eveiywhere in Q, and /'(r,;/) > 0 in Q. Thus 

Ml'} 5 Jj /■J’f.r, 1 /) f(x, y) (l.r dy > 0. 

This completes the proof. 

If Fix, y) is (liseontmurtus, we can have A(F) = 0 and Dix, y) ^ 0. This is 
for instance, the ease for the distribution 

F\x = 0. r = II = Pi-v - 1, r - 0) = i 

The question remains open whether A = 0 implies D(x, ?/) is 0 if Fix, y) is 
continuous or alisolutcly continuous 
In Section 7 it will lie shown that 

< ‘i' < 3 0 

The upper bound s\ attained when Fix, y) is the, (continuous) d.f, of a 
random variable (.Y, )’) such that A' has any eontmiious d.f and Y = X (or, 
more generally, Y is a monotone function of A'). 

Let 



(3.2) V'(i'i 1 Xi, Xi) = C{x, ~ Xi) — C’ixi — .Ti), 

‘^(•'Ti) 1/1 ; " ■ , a/t, l/s) = , Xi, .rj)f (ri, .ii, .r 6 )^(i/i, , y})i^iyi, yt , yt), 

Then we can write 

(3.3) A = / ' ■ ■ / Mxi , 1/1; • • * ; .1-6, iji) dFixi ,yi) • • • dFix^, yt), 

4. The Statistic I), Let (Ai, I'O, • • • , (X„, T„) lie a random sample from 
a population with the d.f. Fix, y), n > 5, and let 


(4 1) 1) = D„ = 


nin - ij • • • (ii — -1) 


M'4>iXa,,y 


) 


1 % , F„), 


where S" denotes summation over all a such that 


00 = 1, ■ • • , n; oo oij if i j, ii, j => 1, ■ • ■ , 5) 
Since the number of terms in is nin ~ 1 ) • ■ ■ (rt - - 1 ), wo have by ( 3 . 3 ), 
(4.2) ED = A. 

Since in the case of independence ED = 0, D can assume both positive and 
negative values. It xvill be .seen in Section 7 that —jV < D„ < ■^, the upper 
hound being attained for every n, xvhile the minimum of D„ apparently in¬ 
creases with n. 



A XON-PARAMETRIC TEST 


549 


The .■andom vamble O as deflaed by (4.1) belongs to the ckee ot tr-etateUcs 
considered by the aiithor [o|. The tolloiving properties of D folio*- immediately 
from the results of that paper; 

I Lei 


Hx ,, !/i; • • ; i;., j/h) = D, = Is"^ 

2/1 : • ■ • ; , 2/0 = / • • ■ / ^(ai, 2/1; • ■ , x^, yr, x^+i, ■ x„ y,) 

clF{xk+i, 2 /i+i) •. dFixi, 2 / 5 ), (fc = 1, • ■ , 5), 
0= / ■ /l<t>i(.i4,2/., - •,x,,y,) - AfclF{x,,y,) • dF{x,,y,). 

Then the variance of /J„ ^s 

(4.,)) '-"-(hrgcxrf)* 

We have 

25 < n var !>« < 5 fs. 

n ^’ar D„ js a deereasing function of n, and 
(4.4) lim n var D„ = 25 . 

T1 —*00 

II, By Theorem 7 1, [51, //le random variable Vn(Dn - A) /los a normal limit¬ 
ing distribution with mean zero and variance 25 fi. 

It will be seen in section 6 that in the case of independence fi = 0, so that 
the normal limiting distribution of V nD„ is a degenerate one. In this case 
nD„ has tT non-normal limiting distribution (See section 8). 

6. Computation of D. From (4.1) and (3.2) we get after reduction 
4 - 2(n - 2)5 -t (n - 2){n - 3)C 


(5.1) 
where 

(5.2) 


D = 


71 


fn — l)(n - 2)(71 - 3)(n - 4) ’ 


/I “ y y GaiUa *“ 1) had^a l)) 

asal 

n 

7? = 23 («« " l){^a “ 1) , 


a»“l 


C - S C„{Ca - 1), 

anl 


and 



550 


WASSItY IIOBFF0INT. 


a„ = tax a ~ Xt,) - 1, hat(:{Y„ - Y^) ~ 1, 

/S^I H’l 

ax a - Xs,)CiYa ~ Yfl) ~ 1 . 

a„ + 1 and ha + I are the ranks of Xa and F„ , respectively. c« is the number 
of sample members {Xf, Fp) for which both < X„ and F^ < ¥„ . (Since 
F{x, y) is continuous we, may assume that X^ 9^ Xu and I’’* Y^it a j 3 ,) 
Thus, to compute D for a given sample we have to determine the numbers 
Co, ia, Ca for each sample member, calculate A, B, C from (5.2) and insert 
them in (5.1). 


6 . The variance of Z) in the case of independence. Since F(x, y) is assumed 
to be continuous, so arc F{x, w) and F(«>,y). The inequalities xi < rj and 
F{xi , “) < F(x 2 , oo) are then equivalent unless Fixi , «>) = F{xi , »), The 
.same is true of yi < yi and F{ , yi) < j/j). This shows that the function 

(3.2), does not change its value if iSi, y. is replaced by F(xi , “), except 

perhaps on a set of zero probability. Hence A and D are invariant under the 
transformation 


U ^ Fix, «), « = F(«, y); u ^ F{X, «), F « F(«», F). 


In the case of independence we have Fix, y) = uv, and 

f* “= jf ■ ■ • jf {‘l^*(wi, cr; ' * * ; «*, ti.k))'* dui dvi • • • duk dvk, 

where is defined as , with x,, y,- and Fixt, yi) replaced by «>•, v, and UiV, 
respectively. On evaluation of these definite integrals we get 

n = 0, 200.30Va = h 000-30^3 = 

600-30^1 “ 120'30°i-6 = 12. 


On inserting these values in (4.3) we obtain 


( 6 . 1 ) 


var (SOD) = 


2(a* + 5a - 32) 

9a(a “ i)(n — 3) (a — 4)' 


Another way to determine the coefficients ft in the case of independence is to 
compute var for a = 5, 6 , 7 from the exact distributions given in section 7, 
and lira a“ var D„ from the asymptotic distribution of nDn (section 8 ). 

n—*q0 


7. The exact distribution of D in the case of independence for a = 6, 6, 7. 
Let S = {(ail, yi), ■ • • , (*„ , y„)) be a sample from a population with a continu¬ 
ous d.f. We may confine ourselvee to samples with Xi 9 ^ xj and j/i yj if 
i 7 ^ j. Let ixi, yg,), ■■■ , (a;'„, y^J be a rearrangement of (aii, yi), , (.t„ , yri) 

such that x'l < x'i <•• - < x'n and yi < yi < • ■ ■ < y'n ■ The permutation 
n = (^i, ' • • , d") of (1) • • , a) will be referred to as the ranking of the sample S. 



A. NON-PAHAMETOIC 5EST 


551 



where 2 ' stands for summation over all a such that 1 < aj < aj < ... <ai<n 
Denoting by n^'* the permutation obtained from H = (A, • ■ •, i3„) by omit¬ 
ting , we have the recursion formula 


(7.2) nDn(n) = (71 - 5)I]D,_i(n'‘'). 

From (4.1) and (3.2) we obtain 

60De(A , ■' ■ , a) = I 1 , fit, fii) -j- i^(ft, A, A)'/'(A ! A1 A) 

or 

0 if A 5^ 3; 

(7.3) 60De(A , ■ • ■ , a) = < 2 if A = 3 and A. A < 3 or A, A > 3; 

,-l ifA = 3 andA <3,A>3orA>3,A <3. 

We have 

(7.4) D„(A, ,A) =2>n(A,A,A,-'-,A) 

~ 7)n(A j ■ ' j A—2 > ^>1 j A—l) “ Ditifin I A-1 I ■ ' ) A) 

For ?i = 5 this follows from (7.3) and for general n from (7.1). 

Also, by the symmetry of D„ with respect to ® and y, D* does not change its 
value if in the permutation (A i ■ ■ • , A) the numbers 1 , 2 or ti - 1 , n are inter¬ 
changed or the permutation is replaced by its inverse 
In the case of independence all ti! rankings have the same probabihty l/wL 
To find the distribution of D„ we have to determine the number of rankings 
giving rise to particular values of !>„. 

If w = 5 there are 5! = 120 rankings. Owing to (7.4) we need consider only 
those with A < A i A < A , A < A ■ Their number is = 15. Among 
them those with A 3 yield Dj = 0 ; this leaves only the three permutations 

(1.2, 3, 4, 5), (1,4, 3, 2, 5), (1, 5, 3, 2, 4). 

By (7.3) the respective values of GOA are 2 , —1, —1 Thus we have 

P{60Ds = 2( = *, = -1} = 

P(60A = 0} = il 



552 


lIOEFi'DIXO 


The distribution of , Ih , ■ ■ ■ can l)p oiitainccl in a similar way using the 
relations (7.1) to (7.4). The distribution of D„ for n = 5, 6, 7 is given in 
Table I. 

Fiom (7 3) and (7.1) it follows that —A < IK < nV for n = 5, 6, . 

The upper bound A is attained for If = (1, 2, ■ ■ ■ , a) and every n. To judge 
by the cases n = 5, (5,7, the minimum of D„ apparently increases with n. From 
EDn = A it also follows that A < 


8 . The Asymptotic Distribution of n/)„ in the Case of Independence. 
Theorem 81. // F(x, y) = F{.x, )/<’( «> , y) ami F(x, =o) ami F( « , y) are con- 

iinuous, the random variablo nD„ 4- A has a liniiling dislribuUon whose charac¬ 
teristic function (c.f.) is 


( 8 . 1 ) 



iFirO 


where T(k) is the number of divisors of k. 

Note that t(/c) is the number of divi-sors of k including 1 and k. Thus r(l) = 1, 
r(2) = 2, r(3) = 2, r(4) - 3, • • 

The author has not boon able to bring the d.f. corresponding to the c.f. g{l) 
into a form suitable for numerical computation. Thus Theorem 8.1 may be 
considered as a preliminaiy result. For tlii.s reason only a brief indication of 
the proof is given here. 

If (Xi , Fi), • • ■ , (Xn , F„) i.s a random sample from a population with d.f. 
F(x, « 3 )F(ot, j/), let nEnCx, y) be the number of sample membons (A’’, , F<) such 
that Xi < x,Y^ < y. Snix, y) is a d.f. depending on the random sample, If 
Ave put F{x, y) = S^ix, y) in A(F) a.s defined by (3.3), we get 


A(Sn) « i r • • • L . F„,; ■ • • ; , Z.,). 




OS -1 


It is easy to prove that if n{A(S„) - E^iSf)\ has a limiting distribution, it is 
the same as that of nD n ■ 

Now it can be shown that nA((S„) has a limiting distribution with the c.f. (8.1). 
This can be done either analogously to Smirnoff’s [ 6 ] derivation of the limiting 
distribution of the goodness of fit statistic , or applying von Mises’ [7] general 
results on the asymptotic distribution of a differentiable statistical function. 
Though the latter paper deals only ivith univariate distributions, its results can 
be extended to the multivariate case. 

By expanding log p(i) in powers of il we obtain for tlic j-th cumulant kj 


K, 


- 1)1 
[(2j)l? 


Bs/-: 


1 ) 


where Ba,-! are Bernoulli’s numbers, 


Bi - I-, Bi = 


By = -^TS, • • ■ . 



A NON-PABAMETHIC TEST 


553 


In particulai, ki = A) and since ED„ = 0, the limiting distribution of nA(S„) 
is that of nD„ + it- 


9. The D-test of Independence. Given a random sample from a bivariate 
population with continuous d.f., a test for independence can now be carried out 
as follows- 

If q:( 0 < a < 1) is thu desired level of significance, let p„ be the smallest number 
satisfying the inequality 

P{Dn > pn 1 F € w} < a, 


where w is the class of d f’s of the form F(x, y). 

Compute D„ as shown in section 5. Eejcct the hypothesis Hu of independence 

if and only if Dn > pn ■ 

For n = 5, 6i 7 the numbers pn can be obtained from Table I. 

From Tchebychef’s mectuality and (0 1) we have 


p| 30/)„ > 




2(71.» + fin - 32) 


Hence 


— l)(n — 3) (re — 4) a 

< ■/ 2(re^+5re-32) 

- y f)re(re - l)(re - 3) (re - 4)a’ 


CL, 


It follows that Pn = 0(n '). 

If A > 0, we have A - pn > 0 for sufficiently large re. Then 

F(D„ > p„l > P( |7^n - A I < A - Pn} > 1 - (varI>„)/(A - p,f 

By (4.4) th<5 right hand side tends to 1. 

This, together with Theorem 3.1, shows that the Z>-test is consistent with 

respect to the class n". . , . rr u 

Since P{Dn < 0} tends to 0 if A > 0, it is safe not to reject Ho whenever 

< 0 An inspection of Table I shows that at least for small re this will 
happen in more than one-half of the cases if Ho is true. 


10. Concluding Remarks. It would be interesting to compare the power of 
the D-test with that of other tests with respect to particular alternatives, for 
instance with the product moment correlation test when the 
with correlation p. A preliminary investigation seems to indicate that for sinal 
values of 1 p 1 and n « the power efficiency of the D-test as compared with the 
SucrmlntcorrcMoaLt-atberbw Th. 

for values of re which are of practical mterest. On the ^ 

expected that a test which is consistent with respect to a "ive 

will have a lower power with regard to a sub-class of alternatives than a test 
which has optimum properties with respect to this particuW ^j^b'Class^ 
considerations suggest the problem of selecting from a given class of non para 



554 


WASSILY IIOEFFDINQ 


metric tests (such as those consistent with respect to fl") a test which is most 
powerful with respect to certain parametric alternatives (such as normal dis¬ 
tributions). 


TABLE I 

T/ie distribution of jD„ in the case of independence for n = 5, 6, 7, 
n — 5 71 = 7 


a; 

15?{ 60115 » x] 

p\mih ^ xl 

X 

030PI12601L - xl 

P{1280P, > 

-1 

2 

1.0000 

-11 

8 

1.0000 

0 

12 

0.8067 

-8 

32 

0.9873 

2 

1 

0.0667 

-7 

32 

0.9365 




-6 

8 

0.8857 




-5 

28 

0.8730 




-4 

88 

0.8286 




-3 

64 

0.0889 




~2 

50 

0.5873 


71 = 6 


-1 

8 

0.4984 

X 

90P|l80Po = ■*! 

P| lS0/7« > xl 

0 

88 

0.4857 




2 

77 

0.3460 




3 

24 

0.2238 

-2 

4 

1,0000 

4 

4 

0.1857 

-1 

28 

0.955G 

0 

50 

0.1794 

0 

3C) 

0.0444 

8 

8 

0.0905 

1 

10 

0.2444 

9 

4 

0.0778 

2 

1 

0.0067 

12 

24 

0.0714 

3 

4 

0,0550 

14 

2 

0.0333 

6 

1 

O.Olll 

18 

12 

0.0302 

- -- 

---- 


24 

2 

0.0111 




30 

4 

0.0079 




42 

1 

0.0010 


APPENDIX 

A. Equiprobable rankings and independence. Let Ilnv, (r = 1, 2, ■ • ■ , n!) 
be the nl possible rankings of samples of size n from a bivariate population with 
continuous d.f. Fix, y) (of. section 7). 

If P{x, y) = P(x, 'X')F(oi, y) wo have 

(Al) P(IIni.) *= l/nl (r = 1, ,nl) 

for every n. 

Does (Al) for some particular n imply independence? This is not true for 
71 = 2. In this case (Al) is equivalent to P{(1, 2)j = If the distribution 
has a p.d. f{x, y), we have 




A NON-PASAMETBIC TEST 


555 


/ “ /’"P /•“ r® 


/(«, v)dudv 


f{x,y)dxdy, 


which equals I whenever /(a;, y) = J{—x, y). However, we have the following 
theorem: 

Theorem. If Pix, y) is in Q," and (Al) holds for some n > 5, then 

(A2) y) = I/)- 

Proof. (4.2) can bo written m the form 

n) 

(A3) Z^n(lI„.P){n„) = A. 

^ V“»l 


If (Al) holds, the left hand side of (A3) has the same value as when (A2) is true. 
But in the latter case we have A = 0. Hence (Al) implies A = 0 By Theorem 
3 1 this IS sufficient for (A2). The proof is complete. 


B. Non-existence of unbiased rank tests of independence. 

Theorem. There do not exist rank tests of independence which are unbiased on 
any significance level with respect to the classes or SI". 

Proof; Lot n„r have the meaning of Appenda A. Any critical region of a 
rank test of independence is a set S„ = {n„.,, • • , n„v,J of m rankings In 

the case of indepondonce P(Sn) = P{n„. e = m/a' We may confine 

ouiselves to gignificanee levels m/%!, m = 1, 2, • • , nl - 1. To prove the 

theorem it is sufficient to show that for every n = 2, 3, ■ • • , for some 

ot(1 < m < n! — 1) and every Sm. there exists a d.f. F in U" such that 

PiS„ 1 F) < m/n\. 

We shall prove the slightly more general proposition that this holds for 


m = 1, 2, 3. 

Let the bivariate distribution An bo such that the probability mass is dis¬ 
tributed uniformly on the n — 1 segments 


(Bl) 



< X < 



n — 2k 

ik 


1, 2, ' ' • , n l), 


and is zero in any region not containing a part of these segments. 

Let Bn bo the distribution which is uniform on the n - 1 segments 


(B2) 


k ~ J 

?i — 1 


< X < 



X + y 


2k - 1 
n — 1 ’ 


(fc = 1, 2, • • •, % - 1), 


and zero elsewhere. 



556 


AVAKSILY irOKFFMNG 


The cli’s of both and B„ me oontiiiuoiia, with 

F{r, «) = -r) - X (0 < X < 1). 

Since the prohiihility of (A', ]') lyinp; on uny one of the. hOKmcnts (Bl) or (B2) 
is l/(n - 1), the, probabilities 7^(rr/vl„) and P(ir/7iJ are easily obtained in 
terms of the multinominal distribution with a — 1 equal probabilities. In 
particular, yve have 

(B3) P(l,2, - 1; P(ft, n - 1, • • •, 11 i?,) = 1, 

P(l, 2, • •. , a I A„) = Pin, a - 1, - -. , 1 1 B„) = (a - 1) ( 

(B4) ^ / i_Y‘‘ 

u - ly ' 

Pin,n- I, ••• , 1|.4J = P(l,2, = 0. 

In general, if II„ is any permutation of 1, • • • , n, wc have either P(n„ | A„) = 0 
or P(II„ I B„) = 0. For any !!« with P(n„ | .-in) 0 contains at least one 

“run up“ of 2 or more numbers (a sequence of consecutive numbers 
i, t + 1, • • ,i + AO which is not preceded by smaller numbers oi followed by 
larger numbers On the other hand, if a Il'„ with P(Il'. | Bn) 9 ^ 0 contains a 
“run up”, it is either preceded by smaller numbers or followed by larger numbers 
I-Icnce if P(nn | .4„) 0, then P(n„ | P„) =■ (). Similarly, P(n„ | P„) ^ 0 

implies P(lln I rln) = 0 

From (B3) it follows that for any sot A'#, of m ranicings which does not include 
( 1 , 2 , ■ • , n) or (‘H, n “ !,•••, 1 ) we have either | ^ 3 ) = 0 or 

P{Sm I Bi) = 0 . Hence vvo need only consider critical regions containing both 
(1,2, ■ ■ ,n) and (n, n — 1, • • , 1). For ?« = 1 there are no such regions For 
m = 2 there is just one. But from (B4) it follows that for n > 2, 

Pil,2, ■ ■,n\An)+P(n,n- 1, • - •, 1 

= V%2/ 1 V-\2 

Finally, if n„ is any permutation other than (1, 2, ■ • • , n) or (n, n - 1, • • • , 1), 
we have, by the preceding arguments, either for A„ or for Bn , 

P(l, 2, ■ ■ ■, n) -H Pin, n - 1, -.., 1 ) + p(n„) = ( ■ Y ' < 

\n ~ 1/ n! 

This completes the proof for d.f.’s in n'. To prove the theorem for d.f.’s in 
B we can replace the distributions An and Bn by distributions Ai, and Bn having 
continuous joint and marginal densities and such that the probabilities Pfn | An) 
and P(n I Bn) differ as little as we please from P(n | A„) and P(n | P„), respec¬ 
tively. For instance, A 2 can be defined by the continuous density 



A NON-PAEAMBTEIC TEST 


557 


/(.T, y) = + 

~ K{e - ‘T + y) 

“ Kix + 2/ - e) 


if 0 < y - x< t, 
if - «< 2/ - a; < 0, 
if a: + ^ > f, 


a: > f, 2/ < 1 - e; 

< «, 2/ < «i 


= K{2- i-x-y) if 


a: + 2/<2-e,a;>l-£,?/>l-«; 


- 0 elsewhere, 

where K - 3/(3e“ - 46^) and 0 < « < i If e is taken sufficiently small, the 
distribution satisfies the requirements. The details are left to the reader. 

The proof also shows the non-existence of an unbiased rank test of inde¬ 
pendence for n = 2 and any level of significance (for we need consider only one 
level 11^ s-iso can be shown that for n = 3, any m = 1,2, , 5 and any 

5 the inequality PiSm) < w/3! holds for at least one of the distributions 
ij The question remains open whether there exist rank tests of 

independence which are unbiased for some sample sizes n and some significance 

levels m/n!. 

REFERENCES 


111 II SciiuFFfi, "StatistiCAl inferenco in the noa-parametnc case,” Annak of Math. Stat , 
Vol. 14 (1943), pp 305-332, 

121H IIoTBiAANG AND M. R. Fabst, “Rank correlation and teats of significance involving 
no isBumptions of normality,” Amak of Math Slat., Yol 7 (1936), pp. 29-43, 

131 M. G. Kendadl, ”A new mcasuro of rank correlation,” Bionelnha, Vol. 30 (1938), 
8193 

141 J WOLFOWITZ "Additive partition functions and a class of statistical hypotheses,” 
' Annals of Math. StaL, Vol 13 (1942), pp. 247-279. 

IBl W Hobffwng, "a class of statistics with asymptotically normal distribution,” Annals 
of Math. Slat., Vol. 19 (1948), pp. 293-326. _ 

ifil N V Smirnoff, "On tho distribution of Mises’ w^-criterion,” (Russian, with French 
sumdftiry), MalemlkhesUi Sbornik, Nov, Ser,, Vol, 2 (1937), pp. 973-993, 

171 R YON MibbSi "Ot^ tbo asymptotic distribution of differentiable statistical functions, 
Annols of Math. Slat,, Vol. 18 (1947), pp. 309-348. 



ON PREDICTION IN STATIONARY TIME SERIES 
By Herman 0. A. Wold 
Uppsala University 

Summary. In time series analysis there are two lines of approach, here called 
the functional and the stochastic. In the former caso, the given time series is 
interpreted as a mathematical function, in the latter case os a random specimen 
out of a universe of mathematical functions, The close relation between the 
two approaches is in section 2 .shown to amount to a genuine isomorphism 
Considering the problem of prediction from this viewpoint, the author gives in 
sections 3-4 the functional equivalence of his earlier theorem on the decom¬ 
position of a stationary stochastic process with a discrete time parameter (see [9], 
theorem 7). In section 5 the decomposition theorem is applied to the problem 
of linear prediction. Finally in section 6 a few comments are made. Since 
various aspects of the isomorphism in question are known, this paper might be 
regarded as essentially expository. 

1. Introductory. Let the sequence 

(1) ‘ ‘) ®(—1 ) > nJi+i ‘ ‘ ‘ 

be an empirical time series such that no clear trend is present in the average 
level, in the variance or in any other structural properties of the series which we 
might choose to consider. Such series are usually called stationary as distinct 
from evolutive, terms which of course are somewhat loose when referring to 
empirical data. We shall consider two approaches in the theoretical analysis of 
stationary series. It is convenient to allow xi to be complex; the conjugate 
complex of Xi is denoted St . 

In the functional approach, tho sequence (1) is regarded as forming an infinite 
sequence, say {.-Ci), where t runs from — «> to + ■». To define stationarity, let 
us for any infinite sequence {zt] write 

(2) M\zt] = lim - -T—- X) (4 <2 -^ + “)■ 

The limit M[z![, which will be called “tho average of z/’, is clearly independent 
of t. It is also seen that a necessary and suflficient condition for M[zt\ to exist is 
that the same average should be obtained when k is kept fixed while 
and when k is kept fixed while h—y — «. The stationarity of the sequence (1) 
may now be brought out by assumptions of tho typo that tho averages M[x] and 
M{xrXi+k] exist, say 

(3) M\xl\ = m, M[xfSi+k] = Th (A: = 0, dbl, ±2, • •)■ 

In the stochastic (or probabilistic) approach, we introduce an infinite sequence 
of random variables, say 

■ ■ ■ ) fi-i > h I ) ' * ■ 


568 


(-CO < i < +eo), 



ON prediction 


559 


or briefly (f.l. The sequence |ftl may be regarded as the generalization of the 
nofaon of multi-dimensional variable, say , .,. , to an infinite number of 
components • ^cording to a basic theorem by A. Kolmogorofi (see e g [91 
§11), the probability distribution of the sequence jf,) maybe defined by spLy- 

ing for any finite set of variables, say [£, , ... , ,ts multi-dimensional dis¬ 
tribution function, say 


(5) F{ui, ■■■ ,Un-,Li, ,Q r= Prob (1,^ < Mi, ■ • • , < u„). 

The sequence thus defined ia said to constitute a stochastic process As is 
sufficient for our purpose, we confine ourselves ±o the case when the time parame¬ 
ter t is restricted to discrete values, ( = 0, ±1, ±2, ■ •. 



Now in the stochastic approach, the empirical time series (1) is regarded as a 
sample specimen, a realizalion, of the stochastic process {fi), just as a point 
, !®n] in an n-dimensional space may be regarded as a sample specimen 
of a multidimensional variable Ifi, • • ■, £»]. In Ime with this interpretation, the 
process (^(j may be regarded as a universe of individual realizations such as (1) 
(see the graph). Taking ou t a realization at random from this universe, we shall 
have the probability, 

F(ui ; ii) = Prob (Jf, < mi), 

that the value taken on by the realization at the time point fi will be <iu ; 
similarly, 

F(mi , Mi; h, h) = Prob (Ji, < Mi , fi, < Wa), 




560 


HERMAN 0. A WOLE 


is the joint probability that the values taken on by the realization at h and k 
will bo <uj and <ui respectively. 

Any expectation referring to the variables (4) may be expressed in terms of the 
distribution functions (5), for instance 

Ale,] = [ = [ [ u-vdl,tfF(u, v; h, k). 

Again interpreting in terms of the univer.se of realizations, A[f,], say, is the aver¬ 
age, over this universe, of the value taken by the realizations at the time point i. 

The above definition of a stochastic process (4) lieing perfectly general, we have 
to impose special assumptions if we wish to take into account particular proper¬ 
ties of the given time series (1). Thus stationarity of the process (4) may be 
defined by assuming that any probability of the type (5) will remain the same 
if h , • ■ ■ , is replaced hyti + t, • ■ • , + (, ivhere i is arbitrary. Alternatively, 

and more generally, the stationarity of the sequence ( 1 ) may be brought out in 
this approach by assuming that the oxpoctation.s 

EM = n, EMh,-k] = Pk 

exist and arc independent of t. 

2 . The functional and stochastic approaches are closely related as to problem'^ 
and results. A typical example is that n and pk as defined above allow the 
representations' 

( 6 ) n = r c’^dF(\), Pk = r {k = 0, ±1, ±2, • • 

where F(\) and $(X) are real, bounded and never decreasing functions. We 
shall now show that the parallelism between the two approaches amounts to a 
mathematical isomorphism. On the one hand, we recall that A. Kolmogoroff 
[3], [4] has introduced and studied the notion of a stationary sequence in Hilbert 
space,—let such a sequence be denoted —, and shown that a stationary 
stochastic process (f,) forms a particular realization of this general, abstract 
IZ,). On the other hand the following elementary lemma shows that another 
realization of jZ,] may be formed on the basis of a stationary sequence ]a;,) 
such as ( 1 ). 

Lemma. Let (a:,) he a sequence of type (1) which satisfies the conditions (3) hut 
is arbitrary in other respects. We write 

(7) (x,) = • • ■ , x,_], X,, x,+,, • • • , 

where x, = (a:,}, and x,+j, is obtained from x, by replacing xt by xt+k for every t. 

‘ Ab to Tj,, see N. Wiener [ 8 ], who treats the ease ol a continuous lime parameter U 
Astop*,seeH Wold [9], p. 66 , and A. Kolmogoroff [4],p.S 



ON PEBDICTION 


561 


For the elcnirnls Xt , let mulliplicaiion hy a real or complex constant and addition 
he defined by 

aXi = {axi}, i;, = \xt + yt], 

and let R be the class formed by all elements of the type 

C^nXi-v + C-„ n^i-n+l + • • • + CoXt + • • ■ + C„Xt-n , 

where, n and c_„ , • • • , c„ are arUirary. Let the inner product {xi, y,) of two 
elements Xi = [xt], yi = [yt\ in R be defined by 

(x,, yt) = M[xfVt], 

and let R' he the closure of R 

Then R' is a space the dimension of which is denumerable or finite. In the 
former case, R' satisfies the conditions of a Hilbert space H, in the latter case it can 
be extended to a Hilbert space H. In any case, the relations 

(8) UXi = , — «> < i < -f" “ I 

define a unitary transformation U in H. 

The first statement of the theorem is obvious. It is also easily verified that 
R' satisfies the conditions A-C of an abstract Hilbert space as defined by 
B. V Sz. Nagy [7]. If R’ i.s of finite dimension, a suitable extension will make R' 
satisfy the conditions A-E of a Hilbert space as defined by M. H. Stone [6], 
The transformation U is clearly unitaiy, it is also plain that the definition (8) 
of U extends to the whole of H. 

Now since both (4) and (7) arc particular realizations of a stationary sequence 
[Xi] in Hilbert space, any theorem on such a sequence {X*) will give, as imme¬ 
diate corollaries, similar theorems on a stationary sequence \xt) of type (1) and 
on a stationary stochastic process {S<)- Generally speaking, the former corol¬ 
lary will involve averages of one or more functional sequences {xil, [yi\, • 
over time I, rvhile the latter will involve averages, for fixed t, over the realizations 
of one or more stochastic processes {S4> l?/4) ’ ‘ • 

Let us consider the following problem of prediction in the light of the iso¬ 
morphism established: Suppose the data (1) are known up to f — 1, say for 
I - 1 , i — 2, t — n, what can then be said about xt , or, more generally, 
about Xi+k? One approach to the problem is to apply harmonic analysis to the 
given data, and to extrapolate the function obtained up to the time point i -\- h 
Another approach, the one which we shall consider, is to approximate xi^-k 
directly in terms of the given data Confining ourselves to linear prediction, 
and making use of n observations, the prediction formula will then be 

(9) pred. xi+k = no"'*’ + + • • + «" ' • 

The error of prediction, also called the residual, is denoted 

(10) y\l'k = ^i+ic - pred xi+k . 



5G2 


KERMAN’ 0. A. \V0I>I> 


Conaidoring first the functional approach, we apply formula (9) for all /, 
thus obtaining the residuals 

‘ • I Vi-i > Vi I y<+i , “' - 

In this approach we are led to regard the residual varianco, i.e. 

(11) 

as a total measure of the accuracy of the prediction. If wc follow the stochastic 
approach, on the other hand, the formula (9) is applied, for fixed I, to all realiza¬ 
tions (a;,) of the process (fd. In this case, the varianco expectation, 

( 12 ) 

is regarded as a total measure of the accuracy of the prediction. The prediction 
coefficients are determined by minimizing the e.xprassions (11) and (12), 
respectively.’ It needs no further comment that the two lines of approach in 
prediction theory will, thanks to the isomorplii.sm indicated, lead to parallel 
results, 

In a study of siationaiy stochastic processtis, the author has earlier found a 
decompOisition theorem which has a direct bearing on the prediction problem 
(see [9], theorem 7). The main purpose of tlio present note is to develop the 
corresponding decomposition for a functional sequence of tiu; type (1). Two 
theorems on this line are given in sections 3-4. Tlu) proof,s are briefly indicated; 
for further details, the reader is referred to my treatment on the stationary 
process [9], In section 5, the decomposition is applied to the prediction problem. 
A few comments follow in section 0. 

3, Auto-regression analysis of stationary time series. Let {xi} bo an infinite 
sequence (1) such that the conditions (3) are fulfillod. By (9)"(10), the resid¬ 
uals j/t”'’’ will he well-defined for every n and 1. According to elementary 
properties of least square residuals, we have 

(13) Mlj/i’"'”! = 0; M = 0 for A: = 1, 2, • ■ ■ , n. 

Since the minimum variance cannot increase if we replace a by n -1- 1, we further 
have 

Af[ 11 “j > jif[ 1 D > M [ 12/)"''“'“’ ri S 0. 

Making n -> «>, we infer that there is a constant d’ such that 

lira Mil 

n-^eo 


“ For real sequences (a:,) and {f,}, this miniraization is, of course, nothing else than the 
method of least squares. 



ON PREDICTION 


563 


Making ii«c of the Gmm-Sclimidt orfchogonahzation procedure it is furttc- 
possible to show that there exists a sequence [y,] such that ’ ™ ^ 

- 2/1 f] = 0. 


lira 


In the usual tenniiiology, the sequence {y,\ is the Umii in the 
quence ji/t h 


mean of the se- 


(14) l.i.m. (• 


2/llf 


. yr\ 


2/1+1 , 


•) = 


2/1-1) 2/i 12/i+i) ■ ■ ■ . 


Wo may remark that (14) does not necessarily imply that y',*’ will for a fixed 
t have yi for ara ordinary limit We also note that the limiting sequence (yd 
IS not uniquely determined, for instance, the relation (14) remains valid if a 
finite number of the elements yt are modified 
As is easily shown, wc have 

(15) hin M [\|=] = M[\y, f] = M[yrx,] = > o 

and [cl. (13)] 

(16) ibr[2/iM = 0, /c = l, 2, •••, 

Moreover, the sequence [yt\ is non-autocorrelated, i.e. 

(17) ■(If^l2/i2/i+d ~ 0| /c = d:l, dz2, • • , 

In fact, observing that 

MlytVi+k] = lim = 1, 2, • • •, 


and supposing that (17) is not true, we would have 
(18) \M[i/r-yt''^]\>a>Q, 

as V runs through some sequence ni, nj, ■ ■ • , such that n, . The relation 

(18) , however, would imply 

(19) ~cytf\^]<<fil-^a^) 

for some sufficiently large v and for some suitable c. Since - c y[-i is a 
linear exprcsfllon of the type appearing in the right hand member of (9), the 
relation (19) is incompatible with (16). Thus (18) is not possible and (17) must 
hold good. 

Part of the above analysis is summed up in 

TkboiU'M 1, Given a lime senes (aiij which satisfies (3), Zef e > 0 le arbitrary. 
Then an integer n and a set of coefficients a-"’”’ exist for which (9) defines a residual 
senes such that 

= 0. I n < e fc = ±1, ±2, ■ ■ • . 



50i 


IIEnMAN 0. A. ■\V0LD 


4, A decomposition theorem. We shall first consider the special case where 
(15) gives 

(20) ilf [ I y, j *] = rf’ = 0 , 

which IS the same as 


I i.ra. (• 


2/1-1 j 


yi 


(o.O) 


•) = (■•■, 0 , 0 , •••). 


In this case wo shall say that the sequence (aid is determiniatied the interpreta¬ 
tion of this term Iieing as follows; Given the sequence (aid for all time points up 
to and including i - 1, we may, by the use of a finite number of the given values, 
predict aif+i with any accuracy; i.c., with a residual error of arbitrarily small 
variance This can be shown by induction. In fact, suppose that we are able 
to predict each of , ■ ■ • , Xt+k~i in such a way that the prediction error has a 
variance < t, where < is arbitrarily prescribed. Letting 5 > 0 be arbitrary, we 
can then find a formula of type (9) which predicts Xt+k in terms of the exact 
values .^(+ 1 - 1 , •• and which gives a residual variance i/{k + 1). 

Replacing hero .Th i-i , • • • , .ti by values so predicted tliat the residual variances 
arc less than d/(k + 1) |, S/(k + 1) 10^''”^ I, it is seen that the total 
error of (9) will have a variance < 5. 

We proceed to the general case, (f > 0. According to the above analysis. 
Hi is that part of x< which cannot be linearly predicted from the previous observa¬ 
tions xi^i , xt-i 1 ■ ” . In other words, each time point i brings in an unpredict¬ 
able, random-liko element yt in the Hories {xd- Now while from (16) yt is 
uncorrelaled wdth tlio previous observations .r(_i, Xi-s, ■ ■ , it will in general be 
correlated with the future observations .ti+i, a;(+ 2 , • • ■ . Thus the unpre¬ 
dictable element yc may be regarded as influencing the future development 
Xi n , Xi+z I • • • of tlie series (.ud • In oi’der to e.xamine this influence we proceed 
as follows. 

We approximate xi linearly in terms of i/«, yi-t, ■ , yi-n , writing 

xt = ho3/( -f btyt-i + •’•-}- 6nj/(-n -)- "H Wr"*- 

Determining the coefficients h* by minimizing 

M[\x,~/r^\% 

the coefficients h* will thanlcs to (16)-(17) be independent of n. We obtain 

bo = 1; hi = Mlxrffi-i,]/d\ k = 1,2, . 

The sequence (zl”’) thus being determined for every n, it is further easily shown 
that (zl"* j converges in the mean, say to {zi}, 

(21) l.i,m.( ...) = (■•■,z,_i,z,, ■•■). 


’ The term ia due to J Doob [1 ], in my study [9] I used the term singular. 



ON PREDICTION 


565 


We may thus write 

zt = 7/i -|- -f hiyi^i + . , 

where the sum converges m the mean. Finally, we write 

( 22 ) Xi = zi ut, 

which gives a decomposition of the series [x,] into two components [z,] and 

{Ut\. 

In the decomposition (22) the component zt is that part of Xt which is linearily 
built up by the unpredictable elements [yt] up to and including the time point 
t From (17) we know that the sequence [yt] is non-aiitocorrelated. It can 
further be shown that the square modulus sum of the coefficients hk is convergent, 

£ 1 hi; I “ < “. 

fc -0 

As to the component Ui , it can be shown that {«(} is deterministic. More 
precisely, we have 

l.i.m. [ui - + ••• + ai”■%_„)} = (0) 

where the aS"''” aie the same as the minimizing coefficients of (9). It can further 
be shown that ui is uncorrelated with yi^i and Zt+i, for all k, 

M[utyt+k] = M[uili+k] = 0, (fc = 0, ±1, ±2, •• •)■ 

Summing up the above results, we obtain 

Theorem 2 Any time series {xj} which satisfies the conditions (3) allows the 
decomposition 

(23) {aid = [zi + Ui], 
with 

[zt] = l.i m. [yt + biyt-i + hiyt-i + • • ■ + bnyi~n], 

n— 

where the series [yt], [zt] and [ui] have the following properties 

A. The elements yt , Zt and are obtained from Xi, Xt-\, • • • by the limit for¬ 
mulae (14), (21) and (22). 

B. The senes [yt] has zero mean, 

M[yt] = 0,' 

IS non-auiocorrelated, 

i^bjtyt+k] = 0 , k = ± 1 , ± 2 , • • • , 

and is uncorrelated with [aii-i), {*(- 2 }, ••• , 

M[yfXi-k\ = 0, fc = 1, 2, ■ ■ • . 



COG 


HERMAN 0. A. -WOLD 


C. The series {w() is uncorrclatcd with [yi] and {zi}, 

M\uti/t-^k] = M[utZi+k] == 0, (fc = 0, ±1, dz2, • • 

D. The series {uj) is deterministic. 

6. Application to the problem of prediction. In section 1 we hove considered 
the problem of predicting xi.^k linearly in terms of Xt-i, Xt-i , • • ■ . Now it is 
seen that theorem 2 ^ves the following fonnula for predicting * with an error 
of minimal variance, 

pred. xt+k = ui-^-k + bk+iyt-i + hk+oyt-t + ■ ■ ■ . 

In fact, by theorem 2, A and D, the right-hand member can be calculated with 
any prescribed accuracy from a finite set of observations Xi^i, • • •, xt-n, 
where N of course depends on the accuracy desired; on the other hand, the 
prediction error being 

Vi+k + iiVt+k~i + • • ‘ + hyi, 

wc infer from theorem 2 (B) that this error is of minimal variance, 

I Xi+k — pred xii-k | **] = (1 + I 1 ° • + I I V* 

6. Comments. As mentioned in section 2, the above theorem 2 is the analogue 
of a theorem on the decomposition of a stationary stochastic process given by the 
author previously (see [9], theorem 7). The starting point is then to apply 
formula (9), not as above to the same sequence {si} for varying t, but to all 
realizations (xj) of the process, holding t fixed. The close connection between 
the decomposition in the two approaches is further brought out by the following 
theorem. 

Theorem 3, Given a stochastic process, 

• • •, - 1), f(0, 4- 1), ■ • ■ , 

which is stationary in the sense of (5), let (aij} be an individual realization of this 
process Then (ajj) will with prohahility 1 allow the decomposition of theorem 2. 

In fact, according to the ergodic theorem of Birkhoff-Khintchine,'' the averages 
(2) will exist with probability 1, and so theorem 3 follows from theorem 2. It 
should be observed that the coefficients hr will in general vary from one realisa¬ 
tion to another. 

The theory of tlie decomposition (23) has been carried further in a brilliant 
study by A. Kolmogoroff [3]. His analysis deals with the general case of a 
stationary sequence ina Hilbertspace. Establishing a decomposition of type (23) 

* See A. Kolmogoroff [2], His proof refers to averages (2) of the speoial type where 
q is hold fixed while -+ «. According to the stationarity, however, the average exists, 
and is the same, when h is fixed and fi -+ — «>, and so the general average (2) will likewise 
exist. 



ON PHJBDICTION 


567 


for such sequences Kolmogoroff also shows that the decomposition is uniquely 
determined by properties corresponding to A-D. Making use of the poweiful 
methods of spectral analysis of linear transformations in Hilbert spacoj Kolmo¬ 
goroff further presents a highly developed theory of the decomposition. 

As immediate corollaries of this general theory Kolmogoroff [4] obtains corre¬ 
sponding results for a stationary stochastic process {f,} such as (4) Now thanV.s 
to our lemma in section 2, similar theorems hold good for the functional sequence 
(1). These results include detailed theorems on the connection between the 
decomposition (23) and, on the other hand, the function F (\) which by (6) 
generates the coefficients n . For example, it turns out that {aii} is completely 
deterministic if the derivative F^(X) is constant over an interval of positive 
measure An explicit fonnula for the coefficients h in terms of the function 
F(X) may also be obtained. For proofs and further results, we must refer to 
Kolmogoroff's papers [3]-l4]. 

The theory of the decomposition (23) has later been generalized in various 
directions. V. Zasuhin [11] and J. Doob [1] have shown that the decomposition 
applies to multi-dimensional stationary sequences. As shown by the present 
author [10], the decomposition may be employed for the analysis of linear equa¬ 
tion systems with an infinite number of unknowns. This device makes use of 
the decomposition of non-stationary sequences, a generalization indicated also 
by M. Lobve [5]. 

EEPERENCES 

[1] J. L. Doob, “The elementary Gaussian processes,” Annah of Malh. <Sioi., Vol. 15 

(1944), pp. 229-282. 

[2] A. KoLMOQOBorr, “Ein voreinfachter Bewois dos BirkhoS-Khintohineschen Ergoden- 

satzoa," Rec. Malh (iSftomtfc) iV.iS., Vol. 2 (1937), pp 367-368 

[3] A. Kolmoooiioff, “Stationary sequences in Hilbert space (Russian),” BollelinMos- 

kovskovo Gosudarsiveneno Universiiela, Maiemaiika, Vol. 2 (1941). 

[4] A Kolmogoroff, “Interpolation und Extrapolation von stationaren zufalligen, 

Folgon,” Bull. Acad. Sci. URSS, Sir Math , Vol 5 (1941), pp 3-14. 

[6] M LohvB, “Fonctions aldatoires do second ordre,” Revue Sci., Vol. 84 (1946), pp. 

196-206. 

[6] M H. Stone, Linear Transformations in Hilbert Space, Amer. Math. Soo. Colloq 

Publ 16, New York, 1932. 

[7] B V Sz '^ix<ir,SpektraldarsiellunglinearerTransformakonendes HilberischenRaupies, 

Ergebn. Math. u. Grenzgeb., Vol. 5, Berlin, 1942. 

[8] N. Wiener, “Generalized harmonic analysis,” ActaMoiX., Vol. 55 (1933), pp. 117-258. 

[9] H. WoLP, A Sludy in the Analysis ofStahonary KmeSeries,Dissertation (Stockholm), 

Uppsala, 1938. 

[10] H. Wold, “On infinite, non-negative definite, Hermitian matrices, and corresponding 

linear equation systems,” Arkiv Mat. Astr. Fys.,Y6l. 29A (1943) 

[11] V Zasuhin, “On the theory ' " ” -’ stationary random processes,” 

C R.i,Doklady)Akad.Sci (1941), pp 435-437. 



GENERALIZATION TO N DIMENSIONS OF INEQUALITIES OF 
THE TCHEBYCHEFF TYPE 

By Burton II, Cami> 

]Ve»kyari I’nii'eritily 

1. Summary. The Tchebycheff Rtatintical inetmalily and its generalizations 
arc further generalized so as to apply etiiially well to n-dimensional probability 
distributions. CompariHons may be made with other generalizations [1], [2] 
that have been developed recently for the two-dimensional case. The inequal¬ 
ities fpven in this paper are generally as close as the most favorable corresponding 
inequalities that exist for the one-dimensional case and in many simple cases 
they are closer than tho.se that have been given heretofore for two dimensions, 
In a special cose the upper bound of our inequality is actually attained. The 
theory contain.^ also a le.s.s important generalization in one dimension. 

2. Introduction. It i.s necessary to introduce a new kind of moment, to be 
called a “contour" moment, which is a generalization of the usual one-dimensional 
moment, If we consider first a simple two-dimensional frequency surface, 
y - fik , h), "we may think of as a function of a single variable, x, where x is the 
area of the contour on that surface at the y level. This function may be defined 
so that it is monolonic decreasing and has other simple characteristics. Then 
wc define the rth contour moment as 



and then the generalization of the Tchebycheff-type inequalities follows easily. 
This theory can be applied equally well to almost any single-valued function of 
n variables which is limited and intcgrable in the sense of Lebesgue. Therefore 
the theory will be enunciated initially in a very general form. The reasons for 
the initial statements will be indicated only briefly because a detailed discussion 
of quite similar idea.s lias been given by this author in another paper [3], where 
he applied the same general principle to obtain generalizations of certain theo¬ 
rems in integi’ation theory. 

3. Preliminary theory. Let fih , ■ • • , <n) be a probability distribution with 
limited upper bound L and defined at all points of infinite n-space, which is to be 
denoted by T, dT being the Lebesgue measure of a differential element, We 
thus assume that; 0 g ■ ■ • , i„) ^ L,/has a Lebesgue integral in T, and 

f fdT = 1. 

J T 

Let Q\ denote the set of points in T where/ > X, (0 g X ^ L), and let x\ be the 

568 



GENERALIZATION OF INEQUALITIES 


569 


measure of Qx, for Qx is known to be measurable Therefore xj, = 0, xo ^ «, 
and for each \ there exists a unique Qx and therefore a unique x\. This means 
that T is a single-valued function of X and that it exists (or is positive mfimtc) 
for eveiy value of X in the interval (0 g X ^ L). If X' > X, xx' ^ Xx This 
means that x is a monotonic decreasing function of X. It need not he continuous; 
that is, it may be asymptotic to the line X = 0, and it may have finite discon¬ 
tinuities or “jumps”. Also there may be an eniimerably infinite number of X 
intervals in which x is constant. It follows that X is a monotonic decreasmg 
function of x in the interval (0 ^ a: ^ .xo ^ «>), but it may not exist (in intervals 
where x has jumps), and it may be multiple valued (at points where x is constant) 
We now let y{x) = X^, except that: if X is multiple valued at any point x we 
let y have the minimum value of X at that point. Any other value would do 
equally well because the total measure of such points is zero and they can be left 
out of the integrals that follow. If X does not exist m an x interval, we let y have 
in that interval the value which it has at the beginning of the interval. This is a 
X point Avliere x has a jump. We have thus defined y as a single valued mono- 
tonic decreasing function of x in the interval (0 g x g Xo S ®) and 0 ^ y ^ L. 
It follows from Lebesguc’s theory that: 

r y(x) dx = f /dT, (0 < X g L), f “ y{x)dx = f / dT = 1. 

Jo J Qx *'0 

Finally we restrict our function / so that theie shall be at most a finite number 
of points X where X is multiple valued (intervals of X ovei which x is constant), 
and hence the number of discontinuities of y will be finite. This restriction Inay 
not be necessary but it is convenient and not embarrassing m applications. 

4. Contour moments. The rth contour moment is denoted by jir. The con¬ 
tour standard deviation is denoted by v We define 

! x"ydx. 
a 

It follows that go = 1, and that 

^2 = O' = / X y dx. 

Jo 

We shall also let Ssr = hr/S-^' We now assume that r is cither zero or a positive 
integer, but in much of what follows this assumption is not necessary. 

Example 1. Let/(h , U) = The equation /(h, h) = X, 

defines a circular contour whose area m x = irih -f- k) - log 27rX. Hence 
y = \ = (27r)“‘e"*'’'^ and 

fir = f x’'y dx = {2irJ r\ ^“ = 871 , a^r = (2?’)'/2 . 

Jo 

6. Contour moments and one-dimensional moments. If n = 1 and if/(h) - 
/(-h), then 


nXn ^(# q / 2 ) a _ 

fijr = jf .x'^y dx = (2t) 7(0 dt = • 2 , 



570 


BUHTON' H. CAMP 


wlicre Mjr is an ordiiiaiy iiioincnt. Honcc also S- = 2cr^ a^r = hr/^^' = M2f'2*7 
= a 2 r . It is to bo nolicetl that, although a,r = air, hr ^ M 2 r ■ One 
could alter the dolinilion so that those two moments woidcl be equal by inserting 
into the definition of contour moments the factor 2", using x/2" in place of x, 
but this would introduce a slight complication for a doubtful advantage. Al¬ 
though it would seem to be desirable to define the oven contour moments hr 
so that they would become the ordinary moments ^^tr in tlie symmetrical one¬ 
dimensional case, such a definition would not make the two corresponding odd 
moments equal, and it would not make the two even moments equal in the non- 
symmetrlcal one-dimcnslonal ca.se. So it seems better not to introduce this 
factor 2", but to take note of the relationships that hold in the one-dimensional 
case. 

Tiikobem. Let 


Pi = 



where X is such that Xk =■ So-. Then 


1 


Pi ^ hr 



2r + 

■ '2r J 


Coiiollahy 1. In parlicular \ — Pt 

CoROiiLARY 2, 7/ f >= 1, 1 - Pi g 4/9S*. This theorem and these two 
corollaries are minor generalizations even of the corresponding one-dimcnsional 
inequalities, for it is no longer assumed that the probability distribution /(I) 
has but one mode. 

Proof of Theorem:, Let gix) = y{x) if 0 ^ a: g xo ^ », let g{x) = y{—x) 
if — 00 ^ —xo ^ X ^ 0, and lot {/(x) = 0 elsewhere in ( — «>, “o). Then g{x) 
has all the properties explicitly required of/(x) in a former paper by this author 
[4] in which this theorem was proved for the one-dimensional case. That is: 
g{x) is a frequency function whoso mean is zero, and 

pOO 

1 g{x) dx = 2, and / g(x) dx 

J— M J jir 

is the probability that | x | >50-; g(x) is a monotonic deci easing function of 
1 X I for all values of x; and is symmetrical with respect to the central ordinate. 
Therefore, transforming the symbols of that paper to our present notation, we 
have 

L ^ »■'/(*' ■ 


where 



GENEHAMZATION OP INEQUALITIES 


571 


Similarly Msr = M 2 r > “sr = a 2 r, and finally 



This proves the theorem except that there is one exceptional case that requires 
attention. In the proof of the theorem m the paper just referred to the author 
assumed that the function corresponding to our present g{x) was continuous. 
At that time a “frequency” function was often thought of as determined by a 
smooth curve approximating a histogram and implied even the existence of 
derivatives, and so eontinuity was not added to the explicit requirement that 
the function be a “frequency” function, but this condition was explicitly intro¬ 
duced in the lemma on which the proof of the theorem was based, and so we do 
now have to consider separately the case where y, and hence g, may have a finite 
number of jumps. It is quite easy to handle this case as the limiting form of a 
continuous case. In that lemma it was also required that (i^Q/dt^ should exist 
and be non-negative, which would imply that we now have to make the require¬ 
ment that y (corresponding to dQ/dt) shall have a non-negative first derivative. 
On examination of the proof, however, it will be observed that this is not neces¬ 
sary, since y is monotonic decreasing and continuous. That is, in the lemma the 
only use made of the condition, d^Q/df S 0, was that the function Q{t) should 
determine a curve which would be never concave down. But for this it is 
sufficient that dQ/dl be continuous and monotonic increasing, and these condi¬ 
tions are now satisfied by the function which plays the r61e of Q in the present 
discussion. This function will now be defined as 


/: 


7(2;) dx. 


Let 7 (x) be a continuous function defined as equal to g(x) except in the neighbor¬ 
hood of the points of finite discontinuity. Near such points it is to be so de¬ 
fined that it shall have all the properties just required of g(x), and in addition 
so that, for any prescribed R > 1 and « > 0, 

jf yix) dx = x^'gix) dx + Tir, (l^r^R), 




dx -f 1 }, 


where | ,,, ) < e. It is obvious that such a definition of 7 may be made in 

many ways, and one of them is by making use of a linear function in the neigh¬ 
borhood of each point of discontinuity. Since 7 ( 3 :) now satisfies all the condi¬ 
tions of the author’s earlier paper the corresponding inequality is true: 




572 


HUUTON ir. CAMP 


wh(‘rt! 


Hence 


<ri 


f xSdx. 

JQ 




lAjt e approach zero and we liave, eh desired: 


i-p.iK 


Example 2. Let 
j[k, ,Q = A exp 


1 l\ 


2VaJ 


4“ 


+ \)U 

(Tn/) 


A = { 2Tr ' v ,.. .„r. 


This is a form into wliich the general correlation solid may be put by means of a 
linear transformation. Since Pi is a ratio between two parts of such a solid and 
since this ratio is preserved under a h'ncar transformation, the more general case 
may be transformed into this one, or even, a.s will apjiear shortly, into the simpler 
one where all the standard deviations are unity. If / = \ the contour is the 
ellipsoid, 


ffi 


+ 


+ = ~21og- 

(Tn 


1 


The volume of this ellipsoid is 


a; = /i(-2 log /i - Fo^i • • • <r„, 7„ = - 


27r’ 


nil 


nV{n/2) ' 


Hence y = Ae 




p., = / x''y dx 

jQ 




y{nr + n\ 

) \ '~n . ) [r(n/2)?+^" 


Putting r = 2 we obtain 


= 


ir"2" '-”(<ri .. ■ r(3n/2) 

"■""n= “■ [r(n/2Tp’ 


and then 


2m + n 


" = dl' — 

“ r(n/2) ^ Lr(3n72). 


r(V2) 




GENERALIZATION OF INEQUALITIES 


573 


Our inequality becomes- \ — Ps ^ J, where 


air 




Vr , 01 - 1 , whichever is smaller. 


T 5 T’cal numerical values of Ssr and of J are given m Tables I and II. 


TABLE I 


Values of air 


n 


as 

a4 

at 

1 

l-3"-(2r - 1) 

1 

3 

15 

2 

(2r)!/2^ 

1 

6 

90 

3 

3-5-7---(6r + l)/(3-5-7y 

1 

12.26 

566 

4 

(4r+l)!/(5!y 

1 

25.20 

3604 


TABLE II 


Vttlv£S of J 


i 

n 

r 

J 

1 

1 

1 

0.444 



2 

1.000 

1 

2 

1 

0 444 



2 

1.000 

2 

1 

1 

0 111 



2 

0.077 



3 

0 093 

3 

1 

1 




2 




3 




4 




5 


3 

2 

1 

^ 0.049 



2 

0.030 



3 

0.049 

3 

3 

1 

0.049 



2 

0.062 



3 

0.308 













llUKTON H. UAMP 


r)74 

Let us now coinparo J with the true value of (1 - Pj) in one of these cases 
viz.^ when S = 3 and n = 3. The true value is given by ’ 

Js 

where now ^ = 4^ i()5((ri0fj(ra)/3, h = 47r(flrio'2£ra)/3. The integral may be 

evaluated by means of the transformation, I = {x/hf^ and a table of the integral 
of * - 1). We obtain; 1 ^ ?a = 0.0205. This is the true value 

to be compared with the approximation, J « 0.049. The closeness of tbs 
approximation is similar to that which may be obtained for the normal law by 
using the corresponding inequalities for one dimension. To illustrate this we 
find from the usual tables that, if for the normal law 1 - Pj = 0.0205,5 = 2.32. 
Hence the corresponding inequality is (for r = 2) : 1 - Pj g 0.042. 

We shall now show that the upper bound of our inequality is actually attained 
in a special case. Let /(h, ■ ■ •, » 2'“ in the region (-1 ^ h, ■ - ■, fn ^ 1), 

and let / = 0 clsowlierc. Tor this case we shall have x = 0 when X = 2”'', and 
X = 2” when 0 ^ X < 2"". Therefore j/ = 2“'* if 0 ^ x < 2”, and y = 0 
if 2" ^ X. Hence ? = 27^/3, Mo = 1, and the true value of (1 - Pj) is 
1- S/V3; and when 5 = 2/V3. this true value is 1/3. The appropriate in¬ 
equality is: 1 “ Pj ^ 4/9 5^, and when 5 = 2/\/3 the right hand side of this 
inequality is also equal to 1/3. These relationships are true for all values of n. 

REFERENCES 

[1] P, 0. Bebok, "A note on a form of Tchobycholl’a inccjuality for two variableB,” Bio- 

mdrika, Vol 29 (1987), pp. 405-406. 

[2] %, W. Birnbaum, J. Raymond, and II. S. Zuckerman," A generalization of Toheby- 

oheff’s inequality to two climenBionB," Annals of Moth, Stal,, Vol. 18 (1947), pp. 
70-79. 

[3] B, H, Camp, "A method of extending to multiple integrals properties of simple in¬ 

tegrals,” Math, Ann,, Vol. 75 (1914), pp. 274r-289. 

[4] B. H. Camp, “A new generalization of TchebychclI’s atatlstical inequality," Amei. 

Math, Soc. Bull, Vol. 28 (1922), pp, 427-432. See also: "A note on Narunu’s 
paper,” Biomtrika, Vol, 16 (1923), pp 421-423. 



BOUNDARIES OF MINIMUM SIZE IN BINOMIAL SAMPLING 

By R. L. Blackett 
Universily oj Liverpool 

1. Introduction. Much attention has recently been concentrated on the prob¬ 
lems arising when sampling a binomial population, since this is thought to form a 
suitable model for certain industrial and biological procedures. A general 
discussion of such procedures as applied in industry has been given by Barnard 
[2] and various particulai' cases have received detailed treatment by Burman [3] 
Stoclonan and Armitage [6], and Anscombe [1], Unbiased estimation of the 
population parameter (the “fraction defective”) has been investigated by 
Girshick, Mostcller and Savage [4] and Wolfowitz [7]. A paper by Haldane [5] 
is also relevant. 

For such sampling procedures it is necessary to find the probabilities of accept¬ 
ing or rejecting material with a particular fraction defective; to calculate the 
average sample size, and to form an estimate of the fraction defective when 
sampling terminates. All three characteristics may be expressed m terms of 
quantities N(x, y), defined in section 3, so that once these are known, the funda¬ 
mental properties of the scheme are Icnown. 

Here we present a method for determining the N(x, y), investigate the condi¬ 
tions under which it is valid; relate the method to the estimation problem; and 
exemplify its application. The schemes to which the method can successfully 
be applied are of a special type (to wbch the title refers) and include all inspec¬ 
tion procedures with a finite upper limit to the sample size likely to be used in 
practice. Other schemes, when dissected in a manner similar to that used by 
Stockman and Armitage, can doubtless be formulated as an aggregate of the 
special types 

2. Nomenclature. Our nomenclature differs in some respects from that of 
Girshick, Hosteller and Savage, although the same collection of terms is em¬ 
ployed. References to their paper should therefore be followed by a comparison 
of the terminology 

Taking a sample of one from a binomial population consists in observing either 
of two events, whose probabihties are p and I — p {p Oorl). The results 
of successive samples of one can be represented by the path of a particle in a two- 
dimensional lattice of points with non-negative integer co-ordinates This 
particle starts at the origin 0 and at any point {x, y) travels to {x 1, y) if the 
event whose probability is p has occurred, otherwise to (x, y -f 1). Sampling 
terminates when the particle reaches a boundary point, and the set of such 
points is denoted by B. Any point which can be reached during samphng, 
mcluding the boundary points, is accessible, and any path from the origin to a 
point B which can be traversed durmg sampling is admissible] all other points 
are inaccessible and all other paths inadmissible. The index of a pomt is the sum 
of its coordinates 


575 



676 


II. L. PHACKETr 


It Will probably help to note in particular that whereas Girshick, Hosteller and 
Savage used p to correspond to events causing the y co-ordinate to increase we 
use it for X. 

3. Determination of N{z, y). The set B determines the sampling scheme and 
we are concerned with schemes in which all points of index greater than n 
the finite maximum index of points in B, are inaccessible. This condition guaran¬ 
tees that if N{x, y) denotes the number of admissible paths from the origin to 
a point (», y) of B 

T, Nix, y)p^a - py = 1, 

B 

the summation being over all bormdary points. Consequently, to determine 
N(x, y) equate coefficients of p in this identity, the coefficient of p° in the left 
hand side being 1 and all others zero. When all the N{x, y) are known, the 
probability of reaching any subset of B can be calculated and the characteristics 
of the scheme found. 

Sometimes it will be convenient to use 

i:N{.x,v)q\l - qf ^ I, 

where 3 = 1 — p, but the resulting set of equations cannot be independent of the 
first set since if 

E a.p's ^6y(l - pY, 

1-0 I-o 

then 

The polsmomial in either p or g is of degree n; the application of this method 
alone is therefore limited to boundaries containing at most [n 1) points, other¬ 
wise the number of unknowns exceeds the number of equations for them. 

4. Properties of the boundary. 

Thbobbm 1. IJ n is the maximum index of points in B and if any point of 
greater index is inaccessible, then B contains at least n -t- 1 points. 

There must be at least two boundary points of index n for any such point 
(a„, bn) must be approached from (a„ — 1, h„) or (a„, b„ — 1); in which case 
either (a„ — 1, b„ -t- 1) or (a„ + l,h„ ~ 1) is a boundary point. Let P be any 
one of these points. At least one admissible path exists from 0 to P; suppose 
one such path to consist of the points (oo, bo), (“i, bi) , • ■ ■ , (an , b„) where 
at + b/i = fc (fc = 0,1,2, ■ • • , n). It is clear that one or more boundary points exist 
on the line z = o*, having {/ > b*, for otherwise the particle could travel indefi¬ 
nitely along this line; similarly one or more exist any = hk with x> a;,, and if 



BOUNDARIES OP MINIMUM SIZE 


577 


there is just one on each they cannot be identical unless k = n since {a/,, bi,) is 
not then a boundary point. Initially (oo, h) contributes two boundary points; 
since then either Uk+i = oj. and bk+i ^ bt or a^+i a*, and hk+t = h it follows 
that each succeeding point up to and including (a„_i, h„-i) contributes at least 
one more; the point (un , 6n) is counted as soon as x reaches a„ or y reaches 6„ , 
whichever occurs first. Consequently there are at least n + 1 boundary points. 

Reversely, if the boundary contains « + 1 points whose maximum index is 
m, such that any point of greater index is inaccessible, then m <n For suppose 
m> n and apply the preceding result. 

An important class of boundaries therefore comprises those with the minimum 
number of points necessary to attain a given maximum index; they may con¬ 
veniently be termed boundaries of minimum size and for them alone the method 
of equating coefficients yields the number of equations equal to the number of 
unknowns, the first being otherwise less than the second. 

If there are exactly m -f 1 boundary points then (ai, bi), {(h ,h), • •, (a„-i, 5„-i) 
must each contribute to just one; smce an+i = flj, or oa -j- 1 there is one 
point of B on each of the linos a: = 0, a: = 1, • • • , a; = o„ and this set of points 
(0, dD)(l, di), • ■ • , (fflfi, bn) can be denoted by U, the upper part of the boundary. 
Clearly dk+i > df, ~ 1 for otherwise more than one boundary point is required 
on the line x = k + 1. Similarly, there must be a second group of points of B 
(co, 0), (ci, 1), • • • , (a„ , b„) with Ck+i > c», - 1 formmg the lower boundary L; 
and all (n -h 1) points have now been enumerated, the point P belonging to both 
U and L. The characteristic of such sets B is that the sequences U and L both 
have monotonically non-decreasing index, the special case of sequences with 
monotonically increasing index provides the rejection and acceptance boundaries 
of non-rectifying industrial inspection procedures (The difference between 
rectifying and non-rectifying procedures is clearly stated in the introduction to 
Anscombe [1]). 

Theorem 2. For boundaries of minimum size any two accessible -points not in B 
of the same index m cannot be separated on the line x -\-y = mby boundary or in¬ 
accessible points In the terminology of Girshick, Hosteller and Savage the 

accessible points not in B form a simple region. 

Let Q(xi , yf) and R(x 2 , yi) be any two such accessible points of index m and 
suppose aji < xj. There are two possibilities: (am , bm) does or does not he be¬ 
tween Q and R 

(i) (a„ , 6m) lies between Q and R, i.e Xi < Om < ^= 2 . In this case there must 

be points of B at Q'ixx, Fi) with Fi > yi and at R'{Xi , yf) with Xj > Xi ihe 
boundary from Q' to P and from P' to P has non-decreasing index; hence all 
points of U on the lines x = Xi, x = Xi + L • • ■ » ® " 1 have in ex a 

least xi -f Fi > m; similarly all points of L on the lines y ^ yi ,y = yi+■■ ■ , 
2/ = 6m - 1 have index at least X, + > w. By definition of the boundary 

there are no additional points of B on either group of lines between the path OP 
and the line x + y = n, so the proof of the theorem is completed. _ 

(ii) If Xi > flm or X 2 < Um the proof is precisely analogous to that given in (i). 



578 


II. L. PIiACKETT 


6. Justification of the method. Theorem 3. For boundaries ofminmum size 
the cqualions for N(x, y) are soluble and of rank w + 1. 

To prove this we give a general method of solution for the system of equations, 
using powers of p and q alternately: as already remarked, this is equivalent to 
using the equations from tlie eoefficicnts of powers of p only. In the first place, 
note that the coefficient of is a linear combination of numbers N{x,y) with 
x + y >umdy <u-, and the coefficient of p' has ® > t and x<i. 

Lot s = Min(do, di, dj ,•••,&«) — 1- 

Then from the coefficients of q°, > 5* can successively be determined 

7/(c«, 0) N{ci , 1), • • • , (c., s), the matrix of the equations being triangular with 
ones in the main diagonal. The points in 17 at (n, a + 1), (rj, s + 1), • ■ ■ now 
appear in the coefficients of • • • and complicate the solution 

Letr = Max(ri,ri, 

If either (r, dr) or (c,, a) is the point P then all the remaining N{x, y) can 
successively be determined from the coefficients of powers of p when the values 
of IV(co, 0), iV(ci ,1), - " ,N{c,,s) are substituted in the equations. Othenvise 
the path OP for j/ > s + 1 must have .n > r + 1 so that all points of L on j/ > 
s + 1 have x>r + 2 i.e. any point of L on j: = 0, .r = 1, • • • , s = r has 
y < s; for such points the number of admissible paths is now known. Therefore 
from the coefficients of p\ p\-- - yp" can successively be determined A^(0, do), 
N{1, di), ■ ‘ , Nir, dr), the matiix of these unknowns being again triangular; 
in particular Niri , s + 1), i^(rs, s + 1), • ■ • can now be found. 

Let s, = Min (dr+i, dr,.i, • • , 6„) - 1, so that Si > s. The coefficients 
of q‘^\ ■ ■ ■ ) 7*^ si''^® successively Nic,+i , s + 1) Nic.+o ,, s + 2), > • • , 

N(o,i, Si ); for the points in U at (ru , Si -f 1), (rn, Si + 1) • • • . Let 

Ti = Max (ru , rij, •■■). 

Since there is only one point of U on each line x = constant, n > r. As 
before, if either (n , drj or (c,„ s,) is P the remaining points of U are soon deter¬ 
mined Othenvise the process continues and there result an increasing sequence 
of pomts of L and a similar sequence for U; the process terminates when 
(fl„ , hf) has been reached in both, when all N{x, v) will have been found 
It is clear that for particular cases alternative methods of solution will prove 
more convenient, 

6. Connection with estimation. Suppose that the point (t, u) is accessible and 
let N*{x, y) be the number of admissible patlis from {I, u) to (x, y) 
is in B. Then Girshick, Hosteller and Savage have shown that ]Sl*{x, y)/N{x,^ y) 
is an unbiased estimate of p\l “ P)'*) and a necessary and sufficient condition 
for it to be the unique unbiased estimate is that the accessible pomts no in 
form a simple finite region Hence from theorem 2 such estimates are unique 
for schemes with boundaries of minimum size, An alternative proof is given y 



BOUNDARIES OF MINIMUM SIZE 


679 


considering that if two unbiased estimates of any function of p exist and /(r, y) 
IS the difference between them at {x, y) 

y)N{x, i/V(l - pY = 0, 

U 

where fix, y) is not everywhere zero. The equations formed by equating coeffi¬ 
cients have rank (?i -h 1) as shown by Theoiem 3, so that the only solution is 
J{z, y)Nix, 2 /) = 0 Since each N{x, y) is certainly positive it follows at once that 
fix, 2 /) = 0 and there can only be one unbiased estimate. 


7. An illustration. As an application of the method we take the interesting 
rectifying sequential inspection scheme discussed by Anscombe. The boundary 
points are at (H, 0), (ff -f 6,1), • • ■ iH fib, y), where m is the greatest integer 
leas than iN — H)/(b + 1), and thereafter on the Ime x Y V = N. The equa¬ 
tions for A7(x, y) take here their simplest form, namely equation (4) of Barnard’s 
paper. From the coeffieients of g°, • • • , g", • • ■ , 

1 = NiH,Q)-, 


0 = + h, 1) - HNiH, 0) whence NiH 1) = H-, 

0 = + a, 2) - ^ ^7 + ( 2 ) 2) 


F(F + 2b -h 1 


2 ! 


= + 3b, 3) - (^ I rH + (J 

HiH -h 3b -h 2)iH -h 3b +1) 


whence NiH -|- 3b, 3) = 


3! 


It now appears reasonable to guess the general term as 

- (Jf + 2/b -f 2 / - + 2 /b + 2 / - 2) ■ ■ ■ [H + vh + 1). 

y' 

The proof is therefore complete if we show 

fH\ fH4-b\ , /H + 2b\ HiH + 2h + l) 

/H + 3b\ HiH -i- 3b + 2)iH + 3b + 1) 

-[y-sj 3^ 

. E(H + yb -i- y - 1)(H + yb + y - 2) ■ ■ ■ (H + yb + l)_ ^ 
+ ■■ +(-11 —-- 



580 


U, L. PIACKErr 


Put (i) + 1) = !ind lli(‘ left hand side bocomos 

(7/-1)! (// + ?-!)! (7/+ 2^-1)' 

(7/ y)[y \"■ (77 + ^ -yV.iy - 1)!1! (7/ + 2f-y)\y- 2)m 

_ ... f_ni/ ^ ~ 1)!^ 
(77 + y^ - i/)!yl’ 

which is y times the coefficient of t"’" m (1 + x [(1 + 1)~^ - f)\ 

Rewriting the latter as (1 + - (1 + 7’y]^ it becomes clear that the 

highest power of I is whence the required result follows. 


REFEHliNCES 

[1] F, J. Anscombk, ''Lineiii soquuntial rectifying inspection for controlling fraction de¬ 
fective," Roy.Slal. Roc. Jour, (siipplcmcnt), Vol.8 (19i(i), pp 216-222. 

[2J G A. llAiiNAftn, "Sequential testa m industrial statistics," floj/ Stat.Soc Jour (siip- 
p;fimeaO,Vol,8(19‘lC),i)p 1-21. 

[31 J. P. Burmak, "Sequential sampling formulas for a binomial population," Roy Slal, 
Soc Jour, (nupplmcnl), Vol. 8 (1940), pp. 98-103. 

[4] M, A Oinaiiic'K, F. MosTiir.i.HR, and L. J. Savacie, "Unbiased estimates for certain 

binomial sampling problems with applications," ybmak of Math Slal , Vol 17 
(1946), pp, 13-23, 

[5] J. B. tS. Haldane, "On a, method of estimating frequencies," Biomelnka, Vol 33 (1946), 

pp, 222-225. 

[6] 0. M. Stockman and P. Aumitaoe, ‘SSome propeitics of closed sequential schemes," 

Roy. Slal, Soc. Jour, [supplismenl), Vol. 8 (1046), pp. 10'H12. 

[7] J, WoiiJOWiTZ, "On sequential binomial estimation," Anitols of Math. Slat,, Vol. 17 

(1946), pp. 489-492, 



NOTES 

This scclion is divotrcl to brief research and expository articles and olhei short items 


NON-PARAMETRIC TOLERANCE LIMITS! 


By R. B. Mukphy 
Princeton University 


1. Summary. In this note are presented graphs of minimum probable popu¬ 
lation coverage by sample blocks determined by the order statistics of a sample 
from a population with a continuous but unknown cumulative distribution func¬ 
tion (c.d.f.). The graphs are constructed for the three tolerance levels 90, 
95, and .99. The number, m, of blocks excluded from the tolerance region runs 
as follows: m = 1(1)6(2)10(5)30(10)60(20)100, and the sample size, n, runs from 
m to 500. 

Thus the curves show the solution, |8, of the equation 1 - a = 
Jp(n — m + l,m) for a = .90, .95, .99 over the range of n and m given above, 
where hip, q) is Pearson’s notation for the incomplete beta function 
Examples are cited below for the one- and two-vnriate cases. Finally, the 
exact and approximate formulae used in computations for these graphs are given 


2. Introduction. Suppose a sample of size n is draivn from a population hav¬ 
ing a continuous cumulative distribution function (c d f), F(a:). Let the sample 
values arranged in order of increasing magnitude hexi ,Xi, ■ ■ ■ ,Xn. The frac¬ 
tion, w, of the population which is included between (the r-th sma,llest value 
mthc sample) and (the a-th largest value) is F(a:„-.+i) - Fixr) This 

quantity u has been called the population coverage for the interval (^r, Xn-,+i) 
The probability element for this coverage is 


.. . . Vin+ 1) 

(2.1) fiu) dll = ^ + i)r'(^ 


^ u 


"(1 - u) 


kW—1 


where m => r + s. From (2.1) we can calculate the probability that this coverage 
is at least a given amount, say If we call this probability «, we have 


a = J fiu) du. 


( 2 . 2 ) 

The quantity « it the probabihty that 100|»% of the “““ 

between x.-,«, wd it i» oi*lW tk' * 

pends only on n and m (=r + *)• 

1 All computations involved in this paper were carried out under an Office of Naval Re¬ 
search contract. 


681 



582 


K, n MUKl’HY 


The idea of covenige is more general than it first appears. If we think of 
Xi, , ■ ■ * , a;„ as point.s plotted along the a;-axi,s, we will then have n + f 

intervals; (- (h , xi), • • ■ , (a;„, + which, following Tukey [3], we will 
call Uocls. The reason for llii,s term will be clear when we deal with the case of a 
sample from a population of more than one variable. The coverage for the I'-th 
block (.T,, xih) is Fixi n) " Fixi). The probaliihty element of the sum of the 
coverages of any preassigned group of w - ?a + 1 blocks is given by (2.1) and 
hence the probability a that the fraction of the population covered by any 
a - m + 1 blocks is given by (2.2), By preassigned blocks we mean ones desig¬ 
nated by order statistics prior to obtaining any sample from which a prediction is 
to be made with the,sc blocks In general it is not legitimate, after taking a sample 
and for some reason evident only then, to specify which blocks in this sample are 
to be included or excluded from the coverage. There is no objection, however, 
to specifying a scheme of blocks for the coverage on the basis of past samples 
when the scheme is to be applied to future samples. 

The purpose of this note is to present graphs of /3 as a function of n for m = 
1(1)0(2)10(5)30(10)00(20)100 and for « = .90, .95, .99 There are three figures: 
Figure 1 gives curves for a = .90, Figure 2 for a = 96, and Figure 3 for a = 
.99, The graphs are accurate to at least two decimal places but never more than 
three. In terms of the Pearaon notation (2.2) gives, after minor alternation, 

I - a = In {n — m + I, m). Hence these graphs may also be used to find 
the 10, 5 and 1 per cent points of a variate Z (0 < X < 1) with the c.d.f. I*(p, q) 
for 1 < p < 500 and 1 < ^ 100, 

3. Computations for the graphs. If in the relation (2.2) three of the argu¬ 
ments a, /3, m, and n are given, the solution for the fourth may often be found 
in Pearson [6] or Thompson [0], The values of through n = 100 were com¬ 
puted exactly for these graphs. For larger n, /3 was computed approximately 
from 

( 31 ) s ~ ry W - 2m)^ + lOnCw - ?n) - ( xl - 2w) ' 

where Xb is determined by the relation 

Pr(x^ ^ X«) = 1 - a 

and has 2m degrees of freedom. This approximation is duo to SchefF5 and Tukey. 
For large m the Cornish-Fishor approximation to xl was used. 

4. Illustrations of the one-variate case. The most common use to which 
the graphs presented here may be put is in the prediction of /3 in sampling from 
a distribution of a single random variable. It is this case that was first presented 
by Wilks [1]. Suppose in the mass production of a certain type of screw one is 
interested in the least proportion of all screws manufactured that have lengths 
between the least and greatest lengths appearing m a random sample of 100 



tolerance limits 



_ 

isg^isiii 


liSiilMH 



mill 

mill 

mill 

null 

m 




. . ... 


.JiiiiiiaaBa»r‘«i*g 
l■lll■l■|■|■■V> 

i».:'<iiuiiaaaKfiE 
._„^iiii»?iilaaBBL'/« 


...a^V* 


■iiiii 

mill 

mill 


























































U. II. Mintl'IlY 



see 



Graphs of Population Coverage for the Tolei 







1( ». MUIU’HY 


5H(i 


K('r(n\s. It iH [usRumecl that wo do not know tho distribution of the, length, A', 
of a screw produced in this process. Furthermore, it is fussumed, of course, that 
the manufacturing proceff.s i.s in a state of statistical control in the sense of 
Shewhart. We plan to di.scard two blocks: ( — k, j-,) and feoo, +«>)—exactly 
as many blocks as observation.s. At the level a — .00 we obtain from Figure 3 
that at least 03.5% of all screws in the population .sampled have lengths that fall 
between a:i and sim . If we now draw a random sample of 100 screws and find 
the least and greatest screw lengths to be 1.40 and 1.00 inches respectively, wo, 
may say that at least 93,5% of all screws from the population sampled have 
lengths between 1.40 and l.GO inches at the .99 tolerance level, It must be 
observed that the prediction is made on the basis of preassigned order statistic,s, 
and not of the values 1.40 and 1.00. 

We might equally as well have put the question in another way: If wo want 
at least 93.5% of the lengths of all screws to lie within the range of lengths of a 
sample of 100 screws, then at the tolerance level a = .99 what is the smallest 
sample wo could have in which os many os 2% of the sample are not acceptable? 
Examining the intersections of the curves in Figure 3 with the line P = .935 mt, 
choose the smallest n such that m/ri g .02 and find a = 100. 

6- The case of more than one variate. The ideas given in the introduction 
may be extended to sampling situations involving two or more statistically 
dependenii variates with a continuous joint o.d.f, by means of the notion of blocks, 
The abstract formulation is given by Tukey [3]. We shall restrict ourselves to 
the case of two dependent variates X and Y, but the generalization is obvious, 
Because of the dependence, the joint population of X and Y may bo expressed 
as an associated pair of values W = {X, Y). Suppo.se a sample of size n is drawn 
from this population, and lot the pairs be wi, tez, ■ • ■ , Wn > where Wi == (a;;, yi). 
If we now choose a sequence of n numerically valued functions of x and y (or of w), 

) fn{w), let us order the in a sequence Wi \ wj”, • • • , such that 
Imagine now that the sample values are plotted in a plane 
scatter diagram. We call the first block the set of points w == {x, y) such that 
fiiw) < /i(wP'). That is, wo may imagine the curve fi{x, y) — = 0 

plotted in the plane and that the first block is bounded by tliis curve. Then 
discarding wi^ we take the n — 1 remaining Wi and order them in a sequence 
wi \ Wi \ • • ‘ , Wn-i such that/ 2 (u)(+i) > /sCwf^). We call the second block the 
set of points w = (x, y) such that/i(w) > h{wi^) and also /a(ui) < /zCwf')' 
Thus the second block is bounded by the curves /i(a:, y) — = 0 and 

Mx, y) — = 0. If wo continue this process of discarding and reordering, 

until all n functions /,■ are used, wo shall obtain a division of the plane into 
n + 1 non-overlapping blocks, the "extra” block arising at the last step in the 
process. Then the fraction, u, of "points” (X, Y) of the joint population of 
X and Y that are covered by any n ~ m + I blocks has tire probability element 
(2.1). Also the probability a that the population coverage, u, will be at least as 
large as /3 is given by (2 2) The n — m-\-1 blocks constitute a tolerance region. 



tolehance limits 


587 


An extension of this case has been made by Wald [2]. Namely, before a 
sample is taken let us choose a numerically valued function f of w and choose 
ki<n) of the Wi and order them in a sequence w'f, u,®, . , 

/(wllh) ^ ^ within each “strip’' of the (x, y) plane 

such that w = (r, y) satisfies> /(«,) > suppose that we follow 

the construction in the previous paragraph. Then the population coverage, u, 
hyn -m + 1 blocks from one or more of these strips or their exteriors has the 
probability element (2.1). 

Again the warning must be made that the above functions /, /i, /s, ■ • ■ , , 

the numbers ffi, Os, ■ ■ ,ajc and the sequence of construction must be completely 
specified before samples are drawn to which this scheme is to be applied. 

6. Illustrations for two variates. As an example of the use of the graphs for a 
two-variate case, we use an example cited by Tippett [8]. The two variates are 
the percentage of pig iron, X, and the lime consumption, F, per cwt, of steel in 
100 steel castings made without slag control. A scatter diagram is given in 
Figure 4. Unfortunately the value of this example is lessened by the fact that 
the block schemes were made after the sample had been taken; it does illustrate, 
at least, the two simple tjqies of scheme. 

The tolerance region T (solid lines in Figure 4) resulted from the following 
scheme; let/i(ia) = y,fiiw) = Mw) = fi{w) = ftiw) = ft{w) = -y. Now 
follow the Wald procedure choosing/(to) = y with h = 6, and ai = 1, 02 = 13, 
aa = 40, Oi == 76, aj = 90, ao = 96. Then in each strip 1 / 0 ,> y > Vaj let 
fi(w) = X. Considering only the blocks within the heavy line as the tolerance 
region, wo have, by counting the discarded blocks, m = 16. 

In constructing the region T' (broken hnes in Figure 4) we also use Wald’s 
method, taking f(w) = y — bx with k = 2 and oi = 3, 02 = 96 In the exterior 
region with f{w) > f{wn) let all fi = y + 5x and similarly in the exterior region 
f(v>) < /(wa"’). Then in the strip f(wi?) > f(w) > /(ica®) (i.e., in the region m 
which 41 > y — 5x > —77) choose/i(ia) = y,h{w) = = Ji{w) = —y, 

Mw) = Mw) = Mw) = y + bx, and/,(«!) = Mw) = -y - 5x. Counting 
the blocks outside the heavily bordered region, we have m = 17. 

We obtain by interpolation /3 = .80 for T and ^ = .78 for T' at the a = .90 
level 

7. Ties. A tie is a sample point which in a coordinate system defining a set 
of order statistics coincides in one or more coordinates with other sample points. 
For instance, in the X coordinate of our example (32,159) and (32,186) are tied, 
and (47, 218) and (47^ 218) are tied in any system of coordinates. It would 
seem easier to avoid ties with regions of the type of T' than with those of the 
type of T. 

The existence of ties m the population is assumed impossible, because positive 
point probabilities would destroy the continuity of the c.d.f Therefore we 
attribute the ties to the crudity of measuring devices. 



li I.'. MVItrHV 


,1SS 

A prtHciliuv for liandling fie.- Kivon hy Tiikcy [Ij, 

8. AlAttOWledgments, The author willies sratofully to ucknowledgo the 
aadateaoc of Dr. S, fi. Wilk^t in iIk* ijrppariitiou of this note and of ])r. J. AV. 
Tufcey, vho alao suggraUsl tho data u.s(sl in .>4('Pliori (5, 



RRPEUENCKB 

[1] S, S. WitKS, "StaListioal prediction with spooial refcronco to the problem of toleramjo 

limits,” AnnaU of Math. Sial., Vol. 13 (10-12), pp. 400-100 

[2] A. 'Wald, "An extension of Wilks’ method for sotting tolornrice limits," Annals of Math. 

Slal., Vol. 14 (1943), pp, 46-56, 

[3] J. W. Tuket, "Non-parametne estimation IL Statistically equivalent blocks and 

tolerance rogioiia-tho continuous case,” Annals of Math. SlaL, Vol. 18 (1947), 
pp, 629-539. 



rtu inn T)M!RW, EXPONENTIAL 


589 


[41 J. w. 


ar, TfKBT. r.*li,nation III. Statistically equivalent blocks and 

.s; I'.;""""’' ^-nau oIMain Slat., 

| 5 ) K. PrAHWts'. TnMtt >>f Infamphlf ftrln.Fnjiction, Cambridge 1934 
IS] C. M. Tmmr^ns, "Wdr. «<f ^TennlaR,. points of the itieomplek beta function,” Bio- 
ran H UDU), pp. 

[ 7 ] II.noi.OTE>w wo H. hr.Mytt, ■•Approximation fomulaa for the percentage points and 

n«rtnftli*Aium of J and x*". Aamtn of ilaih. Sial., Vnl 17 (1946), pp 216-225 

[8) L. II. C. Tn'PErr, SM;,h^nl MriMs, in Indu»lry, Iron and Steel Industrial Rescaroh 

lr^»n mti Hwl Federation, If) 13 


THE FOURTH DEGREE EXPONENTIAL DISTRIBUTION 
FUNCTION* 

By Lko a. Aroian 
IluJtUr CoUege 

We shall tierive a remruinn formula for the moments of the fourth degree 
exponential dihlrilmtion fimrlion, stale its more characteristic features, and show- 
how the Kraduiition of ohservctl distrilmtiona may be accomplished by the method 
of momenlfl and the tnelluKl of maximum likelihood. The purpose of the note 
is to inalce iiosKihle a wirier use of tliis function. 

R. A. F'isher (Ij iiilrrKlitrf*cl the fourth degree exponential function 

(1) 3/( - ■ k exp! -m* + + m, 

where ri ?S I ^ '' f-*" "O/^i wt indicates the population mean, a the 

population slitmlurrl rlcviation, and where the fi’s are functions of 

t"y» dt. 

A, L, O’Toole in two slimillating papers [2], [3], has studied (1), however his 
raethcKla and reaulta are unnecessarily compheated. O’Toole requires eight 
inoraentH to determine parameters similar to the ^’s. Both Fisher and O’Toole 
considered the realricled class of (1) with range , “) 

Let 

(2) » r exp I - + i3/) }, do = e""*' dt 

ill 

r'a 

(3) «„ « / fpi dt, obtaining 

•'n 

(4) 4dia,, 3 + 3di«n H + 2/9ja„+i + fta„ = na„_i, n = 1, 2, 3, • ■ , 

‘ PreBcnted la the American Matheroatioal Society and the Institute of Mathematical 
SlatisticH, Heptismbor 4, 1047. 




A. VlWUN 


and for n ~ 0, tho riRlit .•<idi' of f4) is (iofinod as zoro. 'I’he result (4) is valid under 
the aHsuinption 

(.'ll m-K? - 0. 

(liven the first six moments, fix , H ), A , di are readily determined. It will ho 
found that if A > 0, A 0, then n rj «; while if A < 0, and A o, 

h and rj will ho finikt. If we .set n 0,1,2,2, in (4), the .Holutioms are 

A ® !«j(a8 - 4aa) ” (ofr •- 3){«i ” 1)1 d/>: 

A “ {”a3(as — 3a4 ~ as) + («8 " a3)(«4 — H)! *:• 3/); 

((>) 

A ((aj “ aj)(oa “ -laj) f («« — l)(a9 ■“ al -- 3ai)l 4- 2D; 

A = lasCaj — aaaj — 3a4 + Saa) — (ai — 3)(as ~ asa^)) 4- D, 


D sa (aj — «4 — aj)(rti 


1) — (as “■ aj ■“ asm)^ 0- 


To prove D S 0 we adopt the metlnxl of J. hi. Wilkins .Ir. [4]. In only a trivial 
COSO is 1) = 0. Let 

G(cl, b, c,d)^ f (ft *t" hi -|- cl" -j- (H ^ 0, 

•'o 

where y, is any probability function with rtuiRe n S. t S ’’i. Since, G(a, b, c, d) 
is a somi-cldinite quadratic form, its dweriminant ivill Iw non-negative. But 
its discriminant is easily seen to ho eiiiial to D, thus 


«3 

1 

0 

1 


ffj 

1 

0 ; 

as 

aj 

as 

i 

f 

as 

as 

ax 

m ] 


Wo summarize without proofs the essential features of the fourth degree 
exponential. Near the normal point, cn = 3, orj = 0, the fourth degree expo¬ 
nential function, the Pearson system, and the Gram-Chariicr Type A arc essen¬ 
tially alike. Type C [5] while similar is not the same. Note that A may be 
negative and in such a case rj and rj are the two real zeros of the derivative of (1). 
The exponential may bo bimodal as well as unimodal and the normal curve is 
the special case A = /Sj = A = 0. Various special cases where a particular /J 
is zero are readily handled by either (4) or (0). The graduation of both unimodal 
and bimodal observed distribution,s ivill be j)ubli.shpcl clMowhoro, 

Let 

(8) Vi = k exp - 23 n < i < Tiy 



I'orii'rii degree exponential 


591 


where 


1 ’’ 

(9) t; = / exp - S A i' di. 

The likelihood, L, in a sample of N is given by 

(10) L - exp g ij + g + . • • + /3t g Z.j| 

where h = (a;, — m)/cr. Then 


( 11 ) 


fl log /j 

7. aft- 


N dk V' 

TaA" 


and 



If wc assume either ri and rj constant, or exp 
negligihle, then (12) henomes 


I- KArl 


j-i 


and exp 




- 

1-1 


dl and 


^log-^ = n 
L0A 


implies 



whore ay is the sample estimate of ay. For, if in '22il/N we let j = 1, 2, we find 
by (13) that i = m, and v = ]C(a;.- - xf/N The solution of (13) provides esti¬ 
mates of A , /9,, A , and /5i, if we set r = 4. Naturally more time is required 
for the solution of (13) as compared with the method of moments, but the maxi¬ 
mum likelihood estimates are asymptotically eflficient. The system (13) must 
be solved by succeasivc approximations. To determine the moments solution 
all wc do is to replace ay by ay in equations (6). This affords a point of departure 
from which the maximum likelihood equations may be solved. The two methods 
are not the same. 

The fourth degree exponential is readily generahzed to a fourth (or rth) degree 
multivariate function including the normal multivariate function as a spe¬ 
cial case. 


REFERENCES 

[1] R. A. Fisher, "The mfilhiimani'iil I'oundiiiioiif. of Ihcoirlicsl statistics,” Hoy Soc, PM 
Trans., Vol. 222, Senes A, (1922), pp 3.i3-5t) parciculmly 



592 


G. P. 


I'i] A. L. 0'Toot-», "Oil Ihi* of i‘i»rvi‘« for winrh tln' nii'tJiorf of momenta m the best 

method of filtiriK," .Ijiwif < o /Kitil .Vol.d OiOHj, pp i-2f). 

|3] A. L. O'Tnrile, "A method of dt‘l<Tmiiiiii>i Ihf (■onsliintM in tlie Inmodal fourth degree 
fxpoiientiid funetioii," .-Innaf* of Mnlfi. .S'/nt., Vol 1 (HI,33). pp, 70-03 
[4] J. En\KhT WiLKiNH. Jh., ‘‘A iiofi* on .tkoAViiee}. and kurloMa," of Math. A'tat 

Vol. 15 (104-11, jip. :t;t3-rt'i5, ' ’’ 

(51 C. V. L, thiAKUEH, "A nmv form of the freciueiiey function,'' Aimrf unwcrailel, Aoto 
N. !■'. Bd 2i imSK Avd. 2, Art no h. pp. I 

(81 M. (J. Kbnhau., Thf .Irlrnrifrd Thrnry of fitalifittri!, CJritfin <fe t'o., Lmidm), Vid. II, p, 43 , 


AN APPROXIMATION TO THE BINOMIAL SUMMATION 
Hy <5. R (iHtMKH 
]yaHhinglon, I>. 

\Vc eonHitIcr the biiuimial c\-[)iin.sion (q -I- />)”, whore 7 =- 1 ™ p and >1 i.s a 
poailivc integer, For given values of n, p, r, and s, where, up < r < s < n, 
wo are often interested in the probability P(r < x < x) that the numlicr of suc¬ 
cesses X will satisfy r < .r < «. 

When n does not excetnl 50, we can ust* tables of tht' Ineoinpleto, Beta Function, 
or other convenient and accurate tables. For “large’’ value.s of n, we can use 
normal tables. When p is “amall’’, we, can use Poi.sscm table,s, However, it is 
often true that p is fairly small, and yet not small enough l,u give, really accurate 
results when Poisson tables are employed in Ibo usual way, while n is too large 
for use of the tables of the Incomplete Beta Funeiion ami yet too small for ac¬ 
curate use of normal laldes. 

It frequently happens that an upper bound for P{r < x < s) would serve our 
purpose. Wc propo.se to show how to find tliis from Poiason tabIo,s witli greater 
accuracy than could be obtained by u.sing the.so tablo.s in the ordinary way. 

Wo shall denote the general term of the binomial expansion by Bi ~ 
and the general term of the (ionusponding Poisson distribution rvith the same 
value of p by Pi = (pn)'fl'"'’'’/r', Wc shall also eonsidcr a second Poisson dis¬ 
tribution whoso general term is given by P'i = (p'nye~^'''/i', where p' ^ p 
will bo determined later. 

We shall use the following notations: 

(0 Ut - B,+i!B, = (n. — t)(p)/(i -h l)(l — p); 

( 2 ) F. = P.WPf = PR/(f + l); 

(3) F( = P!'u/P/ = pV(t +1): 

(d) Vi ~ Vi ~ p{np — i)/{i L)(l — p). 

From (4) wo obtain at once the following; 

Lemma I. 17, > F, or (7, < F, according as i < np or i > np. 

Thus, the size of the general term of the binomial expansion falls off more 
steeply to the right of i = np than does that of the general Poisson term. 



AX APPROXIMATION 


593 


can uhc lomma I lo obtain an upper bound to P(r<x< s) for any r > nv, 
In fact, 

IK = IKIK/P, ; 

IKn< IKPr,,/Pr] 

IK\i < IKtlPrii/Pr+l < BrPr+i/Pr ) 


IK < BrP,/Pr . 

Adding tliese, we obtain 

(5) P (r < ar < a) = i: £. < (B,/P,) ^ P. = (B,/p,) (tp^-tp] 

t“r t—f \t-»r t=ifi / 

The quantity in parentheses in (5) can be found by use of the cumulative Pois¬ 
son table provided, of course, it is within the range of that table, while the 
Br/Pr can be computed directly. 

In the work we have done so far, we have used a Poisson distribution which 
is less steep than the corresponding bmomial distribution throughout the whole 
intciwal np < r < x < n. It seems reasonable to investigate the possibihty of 
improving upon (5) by using a Poisson distribution havmg a different value f 
in place of p, where p' is chosen so that the now Poisson distribution is of the 
same steepness at x = r as is the binomial distribution. We wish to have 
U, = V'r and Ui < V'i for all r < f < n. The &st of these conditions requires 
that {n - r)(p)/(r + 1)(1 - p) = p'n/{T -f- 1). Solving for p' we obtain 

(0) p' « (n - r)(p)/(n)(l - p). 

We are now ready to prove the following: ^ 

Lemma II. If p' is defined hy ( 6 ) and if P,, F., and F, are defined by (1), ( 2 ), 
and (3) respectively, then Ui < FJ < F<, provided r>npandi> r. 

It is easy to see that Ui/V[ = (n — ^Kp)(l + P)/0- + ^)(1 “ p)(^p0> and 
this can be reduced to (n — i)/in — r) by replacing p'hy its value from ( 6 ). 
Then Ui/V'i < 1 since i > r. Moreover, we have F^/F, = {p'n)(i + 1)/ 
{i + l)(pn) = p'/p = (n - r)/{n - np). But r > np and hence F, < F.. 

This completes the proof of Lemma II. 

Wo are now in a position to obtain an inequality somewhat better than (5). 
The derivation of the new upper bound for P(r < x < s) goes just as before 
except that each Pi is replaced by Pi. We obtain the new inequality 

(7) P{r<x<s)< K'Br/P'r, 

whore K' = ^ 'E/Pt. 

We can get a lower bound as well as a 


somewhat improved upper bound for 



G. t C"ll\MKlt 


[394 

P{r <x<s) by calculating Bt and directly and then applying (5) or (7) 
to find an upper bound M of P[r +1 < j < s), This gives the, inequality 

(8) Br “f- Brn < B{r <x<s)<Br + M. 

This could, of eoutnc, be atill further improved by calculating directly atill more 
of the B’h and using a similar procedure, but one would not care to carry this 
very far. 

Tt) illustrate the various approximations, we have worked out a numerical 
example the raults of which appear below. For convenience in checking, m 
have used a value of n whicli is within the range of the tables of the Incomplete 
Beta Function, even though we would ordinarily use our method only for larger 
values of n. 

Example, s « a « 40; r « 10; p ^ 1/10; p' = 1/12, The tables of the 
Incomplete Beta Function give P(10 < a- < 40) - .OOoOGSl. Using Poisson 
tables in the usual way, we get P(10,4) - i’(40,4) = .008132, which ia not 
particularly good, Using inequality (5) wt) obbiin: Pio/Pio =* -0790 and 
P(l() < X < 40] < .0790(,008132) » .(i05o22. Using (8) and calculating both 
Bit and Pu, we take r => 11 in the inequality (5) and obtain Jim - .0035934, 
Pi, « .0010889, P(U, 4) - P(40,4) « .002840, P,i/P,i « .5057, and hence 
.004082 < P(10 < « < 40) < .003594 + .001007 - .00520. Again using 
method (8), but calculating Bn also and using r » 12 in inequality (5), we get 
,004974 < P(10 < x < 40) < .005099, whicli is quite good. We can obtain a 
still better result by using inequality (7) instead of (5). Then p' « 1/12, 
V ^ 10/3, Bio/Pio ^ 2.150 , P(10,10'3) - P(10,10/3) ^ .00230(1, and 
P(10 < ^ 40) < .005087. 



ABSTRACTS OF PAPERS 

I’rcBcnted at the Madison Meeting of the Institute, September 7-10,1948 

1. On Distnbution-free Confidence Intervals (Preliminary Report). Wassily 
Hoeffdmg, UniversiLy of North Carolina, Chapel Hill. 

Let 0(F) he a functional of a distribution function (d.f) F(x) (where a is a real number 
or a vector), defined over a class ti) of d.f.’s; 0„ a random sample from a population with 
d f. Fix ); g„ < On two funetiona of 0„; and «„ = Prlfl„ < e{F) < Conditions are studied 
under which, given a, 0 < a < 1, we have cither a(„ =■ « or a„ > a or «„ -i a, for all P(x) 
in ti), where fl) is defined independently of the functional form of F{x). Under fairly gen¬ 
eral conditions wo can ohlniii by "studentization" confidence limits fl,„ fl, such that bm 

n„ = a, and 7 “ Urn. JEv ni 0 „ - 0 „] cxistsjy is minimized by using aleast variance estimate 

of S(f') . If there exists a function k(0) such that vnr T„ < K‘(6)n-' if e(F) = 0, for all F 
ill‘T, IV e ran delino confidence limits with a positive lower bound for a;„. This applies to a 
iminlirr of population characteristics estimated by rank order statistics, such as the co- 
cllieients p' and r (estimated by Spearman’s and Lmdeberg-Kendall’s rank correlation 
cocfriricnla, respectively). In certain cases (including p' and t), fl(P) admits a bmomially 
distiilmfrd rstirnatn; then exact confidence limits can easily be obtained This research 
was fionr under an OfTicn of Naval Research contract 


2. On Certain Statistics for Samples of 3 from a Normal Population. Julius 
Lieiilkin, National Bureau of Standards, Washington. 

In analytical chomislry throe determinations are frequently made Sometimes the 
average of only the two cIohmL results is reported, the remaining observation being rejected 
as amimalous. In preparing a ciiliquo of this procedure. Dr. W J. Youden enoounteied 
a need for information on certain properties of the distributions of the statistics 
(x' - x'')/(xj - X|), (x' -)- x")A and (x' - x")/2, whore x' and s" {x' > x") are the two 
clomt of the three determinations. This paper shows how these statistics differ from the 
ones horotofore treated involving “fixed" order statistics, gives the distribution of these 
statistics in random samples of 3 from a normal universe, and lists values of certain of the 
moments of their distributions 

3. On Multinomial Distributions with Limited Freedom; A Stochastic Genesis 
of Pareto’s and Pearson’s Curves. Mama Castbllain, University of 
Kansas City. 

The purpose of this paper is to investigate the most probable configuration of N random 
elements to bo distributed in KiK < N) olass intervals, where known forces are acting, 
Vfa shall call those intervals of energy, using the terminology of statistical mechanics 
Wc will prove that the most probable configuration is a configuration of statistical equi¬ 
librium since its probability of ooourring converges to 1 as N becomes infinitely large 
The main purpose of this paper is to disoover which forces of attraction, operating m 
the intervals of energy, give Pareto’s and Pearson's curves when statistical equilibrium 

is reached. . ,, , 

Wc will consider a random variable 7(0, t being an independent variable obeying a 
multinomial distribution law with limited freedom, and we will exploit the familiar process 
of statistical mechanics. The equation of the frequency curves corresponding to the equi¬ 
librium stage of the statistical experiment will be shown 

595 



AllSTaiCl’ri OF PAPERS 


o'jr. 


4. Fitting Generalized Truncated Normal Distributions. Harold Hotelling, 
University of Norfcli Carolina, Chapel Hill. 

Inanamplr frrmui;)-(liiii(’nHioiial nuriiial iliHirihuliiiiifinly llifiHoindividualB arc supposed 
to he ohservpil which fall in afiiiecilicil hut arhitrary act A of iKisitive, mcaaure. For esli- 
malinK the jiararautcrs the method of inoineiitH is proved equivalent to that of maximum 
likelihood and lharefori' ellicient. 'I'lin prohloiti is tliiia reduced to that of expressing the 
parameters of the normal distribution in terms of (he moments of the truncated distribu¬ 
tion. This however is not generally possihle in simple explicit form. Methods are pro- 
seiitcd for dealing numerically with several simeial cases, including those in which A is a 
linear interval or a iiarallelogram. 


5. On the Distribution of the Two Closest Observations Among a Set of Three 
Independent Observations. G. R. Seth, Iowa State College. 

Lot ii, Is, ij (xi < Is < is1 bo three indopondont ordered observations from a population 
having a probability density fuiiction/(a:). Lcti',i"(*' < i") he tho two closest, then the 
probability density fimelion of i', x" is given by 

0 • /(!') ■ /(i'')(l + F(2i" - X'} - F{2x' - i")] 


where 



/(x) dx. 


In tho case /(i) is a normal distribution with unit variance, tho joint distribution 

X** — x* 

of y *» x'' — x' and 2 « ■ - ia obtained oa 

Xa - xi 


?:v'w„p 



y*(I — s + s’) 


This problem is of interest in oases whore tho conclusions arc to bo based on a sot of 
three observations and one of tho obsorvalions is to bo rcjoctecl in tho analysis of tho data 


6 . The Derivation of Certain Recurrence Formulae and their Application to the 
Extension of Existing Puhlished Incomplete Beta Function Tables. T. A. 
Bancroft, Alabama Polytechnic Institute, Aubiun (presented by title). 

The objects of the paper are: (1) to give a number of new reourronoo formulae in the in- 
complete betafunction derived by a new method, and (2) to indicate how these new formulae 
have been used to obtain now tables of the incomplete beta function that are outside tho 
range of tho p and q values given m tho existing publishod tables. 

The roourronco formulae have been derived by considering tho inoompleto beta function 
ns a special case of the liyporgooraotno series, thus 

'8*(Pi (f) “ F(.P, 1 - 9, p -I- 1, x), 

V 

where tho usual form of the hyporgoometric sorioB is 


F(o,!), c, i) 


^ + r 1! ' c(c-l-l).2! 

4. «(a + l)(n + 2) • h{h + l){h -f 2) i’ ^ 
c(c + l)(c + 2 ) 31 




AllSTRACTS OF PAPEHS 


597 


This aeri™ convcr»-« for 1 x j < 1, andx = 1, if and onlyif a -f- i < c. Certain recurrence 
formulae for F(a, b, c, a) are then directly convertedfor use with B,(p, q), or in the so-called 
n(.rniali?.ed form /*(p, 7 ), pro vidod c ». a -h 1. All conditions have been satisfied by setting 
a«=p,b^\ — q,c^p+l, and g > 0. 

Forcxamplo.uRiiiK (he nbovommtionedmclhodswemay obtain, among many others the 

recurrence fonmilmj ’ 

fil xIAp,fj) -- h(p + 1,71 + (1 - a:)/,(p-h 1,7 -1) =0, 

(iO (p + 7-“ P*)/^(p.g) -77 *(p.? + 1 )~p( 1-*)/.(?+ 1,7-1) = 0, 

liii) ql.ip, 7 -f-1) + p7*(p + 1, 7) " (p + 7)7 i(P. 7) = 0. 

I'ornmln (i) ia caaontially the basic recurrence formula used to obtain Karl Pearson’s 
I allies Ari indication of formula (ill) in anotherform was given by the author in the paper 
’• On Biases in Kstiination Duo to the Use of Preliminary Teats of Sigmficance,” Annals of 
Math, filal.. Veil. 15 (1914), p. 104, and a direct proof was later given by the author in “Note 
mi an Identity in the Incomplete Beta Function," Annois of Math Slat. V 0 I.I 6 (1945), pp. 
98-99. All of the material in the present paper, however,IS new,mcludingrecurrenoeform- 
ulao ami lahlos and the mathematical method of derivation. 


7. Asymptotic Studentization in Testing of Hypotheses. Heeman Cheenopf, 
Cowles Commission for Research in Economics. 

If // IS n hypothesis for which I < Ci(,0) would be a good teat if the value of the nuisance 
parameter 0 were known and 6 is an ostimato of 0, then the following method of asymptotic 
Btudcntisialion (elitaining critical logions of almost constant size) was suggested by Wald 
Consider I < vr((i) where y>(7) =■ oi(^) -f- • • • + c,0) and Pr{i < oi(7) \ = a, Pr{t — ai0) < 
ct(0) I »= ft, ■ • • i'r|i — Ci(^) “ •' ■ - c,(^) < c,+i(a)) = a. It 18 shown that undei reason¬ 
able cotidition# this test, and various modificationB, designed for those cases where the Cr(5) 
arc difneull to obtain exactly have the aaymptotio property that Pr|t < ^(^)1 = 
a + 0(iV"'^®) where N is tUo size of the sample involved or an analogous variable This 
projierty can be extended to the case whore 0 is a fc-dimensional variable. 


S, Completeness, Similar Regions, and Unbiased Estimation. (Preliminary 
Report.) Emeu L. Lehmann and Henry Schbffb, University of California 
at Los Angelos. 

A family SDl of measures ill on a spaoo X of points x is defined to be complete 

if [ /(j:)dill «0forevoryAfin9Jiimplies/(a;) = 0cxoeptonasetAforwhichilI(A) =Ofor 

Jx 

every M in Wl< For a given family of measures the question of completeness may be re- 
gardod as tho question of unioity of a related functional transform, Classieal umcity re¬ 
sults are applicablo to many families of probability distnbutions that have been studied by 
stntisticiana. Tho notion of oomplatcneas throws light on the problem of similar legions 
and tho problem of unbiased estimation, Tho concept of a maximal sufficient statistic— 
roughly, a sufTloionl statistic that is a function of all other sufficient statistics-is developed 
A oonstruetivo method of finding such is givon, which seems to apply to all examples or¬ 
dinarily considered in statistical theory. A relation between completeness and maximality 
is found. 


9. Oa a Proposed Method for Estimating Populations. Cecil C. Craig, Uni¬ 
versity of Michigan, Ann Ai-bor. 

It was proposed to tho author by a biologist that a method be devised for estimating the 
total population in an area which shall utilize the minimum distances between randomly 




.VH.HT1LA.(TS OK I'AI’EH.s 


r)!)H 

(■hii«cn individuiik and their iieiKldti'ra in ilireelhniN lyuiK in eaeli of the four qnailranls, 
AgsumitiK that the area ia a sriuare and that tlie diatrilmluni law ovei it is lootannular, it 
turn# out that the rnmplete diHlrihution of the lenulhs of sides of miniinum sfiuares which 
enntaiu a accmid individual is sini|der fliaii that of niiiiimum diatanees. In hoth eases a 
simple estimate is found wliieh uses most Imt not all of the information in a aamido and 
whose efheieney is eoniparnlile to that hased on a eoinplcte enumeratiiin of a sample area, 
tliougli Hucli an enumeration is not alnaya jimwilile. 

10. Some Results on the Asymptotic Distribution of Manmuni- and Quasi- 
Maximum-likeiihood Estimates. Hkiiman Rtshin, Institute fur Advanced 
Study. 

The author inveaLigates the aaymiitnlic normality of maximum* and riuaai-raaximura- 
likelihood osliraates of parameters of systems of linear storhastie. difference euuations 
The principal tool is the extension of the (lent ral Limit Theorem to depomlont variahles pre¬ 
viously olitairied by the author (presented to the Ameriean Mathematical Society in April, 
191.S). The results obtained are analogous to those in tho cssc in which no differences are 
present. Home extensions are also made lOBystemsof stoehaslie difference eiiuiitions linear 
in the coefficients but not necessarily in tlic variahles. If the complete system of sLoohaalie 
difference equations is linear in tlio jointly dejiendimt variables, asymptotic efficiency is 
domorialrated for maximum-likelihood estimates. 


11 , The Probability Points of the Distribution of the Median in Random Samples 
from Any Continuous Population. CtiuRriinm Kiseniiakt, Lon.v S. Demino, 
and Celia S. Martin, National Bureau of Standards, WaKliinglon. 

The abscissa of tlie fono-tiul) «-prohabiIity point of the distribution of the median in 
random samples of size a “ 2in -)- 1 (m 0) from any eonlinuous population is icloiitical 
with the ahscisaa of tho corresponding J“,,«-i)robability point of tlie. parent distribution, 
where is determined by 

( 1 ) ( 0 S^< 1 ). 


Tiom (1) it follows that 

( 2 ) = 1 - ?<„ 
and that 


(3) 


Pi,n = *.(«+ 1. n-(- 1) 


1 

U->,(« 4-1,71 + 1) 


1 


1 -f- 0 


Ta.ttT+iTi+T) I 


whore iifri, vi), F{vi , vi) , and Zii^t, vi) denote the. e-probability points of tho inoomplcte- 
bota-funotion distribution, Snodooor's ff-distributioii and Fisher’s z-dlstrilmiiou, for 
)-,(= 2q) and vsC- 2p) ‘degrees of freedom’, resiioctively. Tim foregoing results are cer¬ 
tainly not "now”; Harry 8. Pollard IrapUcitly uliliaed the first equality on the extreme left 
of (3) in his doctoral dissertation at the University of Wisconsin in 1933 (see Annals of 
Math dial., Vol. 5 (1934), p. 260), and John II. Curtiss has given tho generalization of (1) 
appropriate to the ease of tho 'lih, position’ in random samples from any continuous popu¬ 
lation (see Amcr, Monthly, Vol, 60 (1043), p. 103) andutdizod (3) explicitly to obtain 
tho 6% point of the distribution of tho median in random samploa of size n ■= 23. The aim 
of the present paper is to give these results soroowhat greater publioity~thoy are hardly 
“well known”. To this end a table (Tablo 1) is given of tho values of P,,, to 5 significant 
figures for « = 0 001, 0 005, 0 01, 0,025, 0.06,0.10, 0.20, 0 25 and n = 3(2)15(10)96, together 



ABBTR;i.GTS OF PAPEES 


599 


\\ilb i‘\inphhion‘< from which Pt.n can bo evaluated accurately and conveniently foi values 
of n (and e) not included m the table. Numerical examples illustrate the use of the table 
and formulas. (lonciHC dcnvalioiis of the fundamental relations and formulas arc given 
m an appendix. 


12, On the Arithmetic Mean and the Median in Small Samples from the Normal 
and Certain Non-Normal Populations. Chuechill Eibenhaet, Lola S. 
DkminG, and Cellv S. Maktin, National Bureau of Standards, Washington. 

I,c( f,.» and .f..„ di'iiolc the nhseiSBao of the one-tail <-probability points of the anth- 
lUcUc mean mid tlio niedinn, more specifically, the abscissae exceeded v/ith probability e 
by the nioau and (he median, respectively, m random samples of size n (,= 2m + 1) from 
any Bpceilic'd population, and let iri„ and ffi„ denote the standard deviations of the mean and 
(he metiian in such fiamples, respootivoly. The following symmetnoal populations with 
zero location paramotere and unit scale parameters are considered in this paper 

Tuve 

t _li2 

iKirmid (OniiHsmii) ’ 

double-exponential (Laplaoo) ie 1*^, 
rectangular (uniform) 1, 

t 1 

Cauchy Irl+x-’ 


— OT < X ^ « 

— to ^ I ^ « 
—J a: S § 

— to £ X ^ « 


■ 800h X, 


— CO £ X ^ “ 


Hcch’ (dorivalivo of‘'logistic") I socW a:, ^ x $ “> 

the atoreniontioned tell-known lesser accuracy of the median as an 

gives prooiso numerical to the ^ 

estimator of the center a norm p P t population), to 4 

Values of the ratio B,.n “ X,.JX^_n are 8 J together with the best available values 

decimal places for the above oornbinations of 5 and 

of fern « 8(2)15(10)66. tiL ! ieTeJan m than the tads of 

showing that the ‘tails’ of the exact and, when 0.05 ^ ^ 

tlio normal distribution with the saine meann argument shows that the point 

0.26, the ratio R..« is loss than ■ '* ' --thud for computing oi,, based on 

of ocpiality is cloBO to the O'Ol^^iirobabihty pomtO „ > 3 . 

the foregoing, is given that is helioycd * J given to 4 decimal 

In the case of the double-exp „ q 06 0 10,0 25, for comparison with the oor- 

plaeosforn-3(2)11,= ,,j < ^„3 for . = 0 005,0,001, 

responding values of x.^. j°™„„ninle8 of 3 from a double-exponential distribution 



600 


ABSTHACTH OP I'APEIIH 


at0.9i5,0.f)H, and 0,nil k'vHaof confidanre. Wlinn n » fi, tlin ninanis'hi’ttpr’ at the 98 and 
.99 levels of coulidenee; and, when n •“ 7, at the (1IHI level. For all other eombinations of 
(and n (> 3), the median is ‘Ijclter.’ 

la the case of the reclanifular dislributuin, vnlups of J,., are tahulated to 4 decimals for 
n «« 3(2)9, and values of S,.^ , the «-prohahilily point of the mid-ranKu in Hamples of n, 
for 71 =“ 3{2)15(10j95, hi oachinatancofore ■*= 0.005,0 01,0.026,0.06,0.10,0.25, and in the case 
of for « =» 0 001 also. The superiority of the midrange over the mean and the median, 
well-known but here exhihited tiumorieally for the first time, is truly ainaf.inir. 

It is planned to provido values of for samples from the seeli and scch’ distributions in 
the final paper. 

13. TJhe Relative Frequencies with which Certain Estimators of the Standard 
Deviation of a Normal Population Tend to Underestimate Its Value. 
CHmicHinn EisuMii.vnT and Ckll\ S. Mvrtin', National Bureau of Standards, 
Wasliington. 

Ijct *t I Xa , ■ • , J-H denote a random sample of n independent tdiaervations from a normal 
population with mean a and standard deviation a. Common estimators of o- are 

«i - /|/s (x, - B’/n, ” *i%/n/(«— 1). xj “ 

»>i = a/-- 2 “ •'‘l/a, >»j »■> iiHV'n/(n — 1), 

y 2 

n 

and fii - (Xf, - x,.))/d.j, where I «> x,/n, i,, is llo' largest and x., the smallest of the 

x's,Ci •“ E(«i)i aivddj « E(X/, — Xj), thosyndml E( ) deiioliiiK "malhematical u.xpeolation 
(or moan value) of.” A table is given that shows to 3 decimals lliu relative freffuenoies 
(probabilities) with whicli tlieso estimators tend to underestimato jr when 7i =» 2(1)10,12, 
15, 30 , 24, 30, 40, 00. Tho results show among other things that, for very small samples 
(w S 10) such as chemists and physicists commonly use, Hessel's formula for the probable 
error, which is baaed on aj, has a marked downward bias in the probability sense (in addi¬ 
tion to its known slight downward bias in the mean value sense), whereas Peter’s formula, 
whioh is baaed oiittij , has only a slight downward bias in tJio jirobability seriso and no bias 
in tho moan value sense. A table of divisors is given by means of which '‘median oatima- 

n n 

tors” of cr can be computed readily from tho basic quantities S (x, — X), 5) | x, — X ], and 

i“l 

(xi — xs), that is, estimators that will over- and undoi estimate £ equally often in repealed 
use, An application to control charts is noted. Median estimators, like maximum likeli¬ 
hood estimators (“modal estimators”) have the useful property that if 2’i is a median esti¬ 
mator of fl, thonKPf) is a median estimator of/(ff), a property unfortunately not possessed 
by tho oustomary "unbiased" ("moan”) estimators. 

14. Some Non-Parametrio Tests of Whether the Largest Observations of a Set 
are too Large. (Preliminary Report.) John E. Wahsii, Douglas Aircraft 
Company, Santa Monica, California. 

Lot x(l), • • • , x(7j) represent the values of ti observations arranged in increasing order of 
magnitude. By hypothesis those observations have the properties ■ (1) They are independ¬ 
ent and from continuous symmelrioal populations (2) For largo n tho variances of the tad 
order statistics are either very large or very small compared with the variances of the cen¬ 
tral order statistics (3) For large n tho tail order statistics are approximately independent 



AliSTllACTri OP PAPERS 


601 


Of the central older alali.slu.s (■!) Tiach obsorvalion is from a population whoso median is 
eitlier 8 or where x(ii — t + 1), ... , x(n) arc from populations with median 0 while the 
cputral and enmllcr ordc't Mtatistics aro from populations with median v? The test ie* 
Accept^ < e if inin lx(n - n) + <k<s^r]> 2a:(«„), where u < 

t, =« r - 1, and(„iB defined by fV [i(ta) < p 1 0 = ^] = a. Here ’ 

« « Pr [min [x(n - n) + x(jk)-, 1 < 1: < s < r] > 2^, | 0 = p). 

her larRe n the h icaiicc level of the test is approximately oi while the significance level 
does not eseeed 2 a for any value of n Suitable values of a can be obtained for r > 4. As 
0 — W ^ the power function tends to zero, while the power function tends to unity as 
0 — ip -) a: . Por 0 “ w < 0 the power function is monotomoally increasing 


15. On the Bounded Significance Level Properties of the Equal-tail Sign Test 
for the Mean. John E. Walsh, Douglas Aircraft Company, Santa Monica, 
California, (Presented by Title). 




The equal-tail sign teat for deciding whether the population mean is equal to a given 

hypothetical value no is defined by Accept fi ^ not/ either Xt < Moor > /ij 

Here x, , (j I, ■ • , n), is the largest of n independent observations drawn from n 
populations which satisfy the conditions, (i) The mean of each population has the value p. 
ill) Each population is continuous at its mean (ill) The mean is at a 50% point for each 
population. Tina paper iiivostigatcs how the eignifioanoe level of the equal-tail sign test 
varies ivhoii (i)-(iii) aro not satisfied. It is found that the significance level does not differ 
noticeably from its hypothetical value under conditions much more general than (i)-(ni). 
Tliis significance level stability, ooinbinod with the properties of being easily applied and 
reasonably offioiont for small samples from a normal population, suggests that the equal- 
lad sign test bo considered for application whenover the population mean is to be tested on 
tho basis of a small number of observations 


16. Infinitely Divisible Distributions. William Feller, Cornell University, 
Ithaca, New York. 

A simple dorivalion of P. Ldvy's formula is given starting from the following definition: 
a distribution function F (x) is infinitely divisible if foi every n it is possible to find finitely 
many distributions Fe, n(®) suoh that F(x) = Fi, „{x)* ■ ■ * F and that tends 

to tho unitary distribution uniformly in n. This definition is more general than the one 
used by P. Ldvy and Kliintohino. The equivalence of the two definitions was proved by 
IChintebino by deep methods Tho new approach renders the equivalence obvious Fur¬ 
thermore, a now characterization of infinitely divisible distributions is given; it is equiva¬ 
lent to Gnedenko’s oharaotenzation but rcquiies no special analytical tools. 


17. Fluctuation Theory of Recurrent Events. William Feller, Cornell Uni¬ 
versity, Ithaca, New York. 

Consider a sequoneo of indopcnd|nt or dependent trials but suppose that each has a dis¬ 
crete sample space. The paper stfehes recurrent patterns S which can be roughly charac¬ 
terized by the property that after every oocuirence of g the process starts from scratch, 
the conditional probabilities coinciding with the original absolute probabilities lypioal 
examples are success runs, returns to equilibrium, zeros of sums of independent variables 
passages through a state in a Markov oliain. New methods are developed uniiying and 

simplifying previous theories and applying to larger classes of reeurrenf events fus shown 



602 


ABSTIUCra OF PAPERS 


in an elementary way the probability that Soccurs at tlie n-lh trial either has a limit or 13 
Baymptoticnlly periadic. This theorem has many eonaeqiiencca For example, the ergodic 
properliea of disorote Markov chains follow in a few lines, and the difference between finite 
and infinite chains disappears. Several thcorcins of the renewal typo are proved Weak 
and strong limit tlieoroma for the number Nn of occurrences of & in n trials are derived 
shedding new light on stable diatributions, 

18. Formulas for the Percentage Points of the Distributions of the Arithmetic 
Mean in Random Samples from Certain Symmetrical Universes. Uttam 
Ohand, University of North Carolina and National Bureau of Standards. 

Using the method of Fisher and Cornish, the 100«% point of the distribution of the arith¬ 
metic moan in random samples of size N from any universe liaving finite cumulanta of the 
first four orders, xi ,xs ,»i) ,*< , is expressed to order 1 /iV’as a function of jV, the 100e% point 
of a standardized normal deviate and the quantities ki , xj, xt/xj The numeiioal 

ooefliBicnts are evaluated for the cases of sampling from rectangular, double-exponential, 
sech and sech* distiihutions. The application of the lesultmg formulas is illustrated nu- 
morioally for e “ .001, .005, .010, ,025, .050, .100, and .250. In the ease of the rectangular 
and douhle-exponontitti distributions, the results obtained for N => 10 are compared with 
accurate values, imlieating the accuracy of the formulas. 



NEWS AND NOTICES 

Readers are mutied In suhnil, to the Secretary of the Institute news items of interest 

Personal Items 

Professor T. A. Bickerstaff has been appointed Chairman of the Department 
of Mathematics at the University of Mississippi. 

Professor Raj Chandra Bose has resigned as head of the graduate Department 
of Statistics of the University of Calcutta, and has been appointed Professor of 
Mathematical Statistics at the University of North Carolina beginning in the 
winter of 1949. Professor Bose is an authority on the design of experiments 
and is writing a book on the combinatorial mathematics of the subject. He has 
also published extensive contributions to differential geometry and to multi¬ 
variate statistical analysis, and has been instrumental in developing practical 
sample surveys. He served as Visiting Professor m the Institute of Statistics 
at North Carolina in the winter and spring of 1948 
Mr. Hamilton Brooks’s paper, “The Probable Breakdown Voltage of Paper 
Dielectric Capacitors,” was one of the fom- papers selected for a national award 
by the American Institute of Electrical Engineers. His paper presents the sta¬ 
tistical treatment of an engineering problem and shows by experiment how 
insulation strength distribution is determined by the distribution of the extreme 
size of flaws. 

Dr. C. West Churchman, formerly a member of the staff at the University of 
Pennsylvania, was appointed Associate Professor of Philosophy at Wayne Uni¬ 
versity, Detroit 1, Michigan, starting February 1,1948. 

Dr. William G. Cochran has accepted an appointment as Professor of Bio¬ 
statistics in the School of Hygiene and Public Health of the Johns Hopkins Uni¬ 
versity and will assume this post in September. Dr. Cochran, a native of 
Glasgow, Scotland, comes to Johns Hopkins from the University of North Caro¬ 
lina ivhere he served as Associate Director of the Institute of Statistics from 1946 
until the present. 

Dr Louis M. Court has been promoted to an assistant professorship m the 
Mathematics Department of Rutgers University. 

Dr. Donald A. Darling, formerly a member of the staff at Cornell University, 

has accepted an assistant professorship at Rutgers University. 

Mr. Aryeh Dvoretzky has been appointed a member of the Institute for Ad¬ 
vanced Study, Princeton, New Jersey, for the 1948-1949 academic year. 

Mr. Arnold King, formerly Director of Research in Statistical Methodology 
for the Bureau of Agricultural Economics at Iowa State College, was appointed 
Managing Director of National Analysts, Inc., Philadelphia on July 1,1948. 

Mr Charles L, Marks has resigned his position as instructor of mathematics 
at the University of North Carolina to accept a teaching appointment in the 
Department of Statistics, The George Washington University, Washington 6, 

D. C. 


603 



,\K\vs \.xi) xoTiri;s 


CM 

Miss Doi'ib Xiiwniun luus uccf-pkid an aiipointinfiil at the l\ S. Xaval Medical 
Research LaboraUny, I'. S. Xa\.al Kubniariiip Base, X'e.w London, Conn. 

Dr. Krncst Rubin lias fiewi tran.sferr(Hl from the Immignition and Naturaliza¬ 
tion Service, Cuiieral Re.spardi Section, Washington, I). C', to the European 
Branch, Areas Division, Office of International Trade in (he Department of 
Commerce- as an Economic Statistician. 

Mr. David Kuliinstein haa been promoted from Junior Re.search Assistant in 
the Statistical Laboratory, University of California, Berkeley, to a Teaching 
Assistant, 

Miss Elizabeth L. Scott, fonnerly an A.s.soeiute and Re.search Assistant in the 
Stati.stical Laboratory, University of California, Berkeley, has been promoted to 
Lecturer and Research Aasistant. 

Dr. Cobind R. Seth, who was formerly a student at Columbia University, has 
accepted an associate professorship in .statistics at the Statistical Laboratory, 
Iowa State College. 

Dr. Charles M. Stein has been promoted to an assistant professorship in the 
Statistical Laboratory, University of California, Berkeley. 

Professor Cerhard Tintncr is on leave of aljscnce for one year from the Iowa 
State College to join the Department of Applied Economics at Cambridge Uni¬ 
versity, Cambridge, England as a Research A.ssociate, 

Mr. L. 11. C. Tippett, Chief Statistician of the British Colton Industry Re- 
.search A8.sociation, (leliverod twelve onc-hour lectures on Slatislical Quality Con¬ 
trol and Industrial Experimentation at a conference at the Massachusetts Institute 
of Technology, May 6-14, before a large audience. Dr, W. A. Showhart of the 
Boll Telephone Laboratories addressed a large audience on the Future of Statistics 
in Industrial Research and Quality Control on May 14 at tlie some conference 


Scientists and Reserve Officers 

The Department of the Army has established a program of particular interest 
to statisticians and other scientists who hold Reserve commissions in the Army, 
and who are professionally engaged in teaching or research and development. 

The objectives of the progi'am are to: 

(1) maintain the useful affiliation of statisticians and other scientists with the 
Organized Reserve Corps, 

(2) provide peacetime Reserve assignments for these officers, enabling op¬ 
timum utilization of thoir education, oxporionco and skills, 

(3) furnish mobilization assignments which will fully utilize thoir talents, and 

(4) adequately prepare the,so officers for mobilization. 

The Technical Horvicos of the Department of the Army submit to these Re¬ 
search and Dovolopraent Reserve Groups research problems and projects which 
pose an intellectual challenge to members of the group. Thus, the program 
provides members of each group a type of training which is in keeping with their 
scientific and technical interests and competence, rather than a traditional 
kind of training session in which scientists have little or no interest. 



:^EWS AND NOTICES 


605 


Tho piogi-am is now being, implemented only in those areas where there is a 
(101111110 local interest To date, eighteen Research and Development Reserve 
groups have been organized, Twelve additional groups arc in process of organi¬ 
zation, Other.s are in the initial stages of formation Several of these groups 
have been formed in communities in which large universities, industrial research 
laboratories, or private research foundations are located Typical localities are 
Chicago, Illinois, Wilmington, Delaivare, Newark, New Jersey; Houston, Texas; 
Washington, D C,; Manliattan and Laivrence, Kansas, Champaign-Urbana, 
Illinois; Pittsburgh, Pennsylvania, Denver, Colorado; and Detroit, Michigan 
Provision is made to submit research projects of mterest to all categories of 
scientists—ohemists, physicists, engineers, geologists, geographers, psychologists, 
niatliematicians, statisticians and all of the biological scientists. 

Reserve ofheors ivho are currently engaged in civilian research, college or 
university teaching, or industrial research or development, or who in the past 
have had specific research experience are eligible to make application for assign¬ 
ment to an Organized Reserve Research and Development Group, A group 
may bo organized in any locality where there are twenty (20) or more qualified 
officer scientists w'ho desire to participate in tho program A subgroup may be 
organized with ton (10) qualified members 
The program is under the general direction of the Research and Development 
Group, Logistics Division, General Staff, United States Army. The entire 
program is outlined in Department of tho Aimy Circular Number 127, dated 5 

May]948. _ i. i-n i 

Inquiry about organization of an Orgamzed Reserve Research and Develop¬ 
ment Group or about assignment to a group already organized should be made 
of the Unit Instruotor, ORC, or of the Senior Army Instructor, ORC, m the 
locality in which the officer resides In localities in which a group has already 
been organized, the Commanding Officer of the group will consider applications 
for assignment of additional officers 


New Members 

The, following persons haoe been elected to membership in the Inslilute 
(Juno 1 to August 16, 1948) 

A Wlalmar Tr. (Umv, of Oregon Medical School) Student, Turner, Orc^a. 

Balrjee,’KaUShailkar,M.A. (Calcutta Univ ) Statistician. Central Sugar Cane Research 

BorSroti!;?iU r^TninTr 

cowan? Tseaich^Wysb War Department, Lewis Sireel, 

East Lywi, Univ.) Research Associate, Educational Testing 

P«.«- Pnn— T«..- 

of Buffalo, 16S Winspear Avenue, Buffalo IS, New Yorlt 



XKWS AND XOTIChS 


(K)fi 

Hofmann, Jotn E., A M. (I’liiv uf Miiiiicxtifu) Senior Ki'eeni'i'li Follow, (MUuiil 
Slffct, Amrs, lam 

Kimball, Allyn W., Jr., H.S. (t inv. of ItulTiiIo) liosoaroli Statititieiiiii, Departmont of 
Hiomolrios, Solntol of Aviation Moilioiiie, Hiunlolpli Field, To\iih 

King,EdgarP., Jr., 11S, (CariiPKio IiiHtiiufeofToehnoloiry) Teaoliiiig AHfliatantiiiMailw- 
inaticN, Dojmriinfiiil of MallieiiintifH, Caniogie InstiUito of Toelitiology, Pittsburgh 
13, Ppiiimylvftiiiii 

Link, Curtis K,, II,H. (I'liiv of Oiogom (Inuluali* SliidenFAsHiHtiint, 7S0 11', dth 
Eugene, Oregon 

Loidflr, Nathan, H.A, tCollogo of iho City of X. ^.) .Mallienmlioian P-2, I 841 Suinnnl 
Place, tFasAi/ii/toa 0. C. 

Manos, Nicholas E., AI.A (Pniv. of Calif.) MotocmilogiHL and StnlisLicmn, Rhode 
hland Aoeniir, AMIirus/imj/ofi 5,1), i\ 

PetetE, Stefan, Pb.l), (Krlaigon, Gornmiiy I Lecturer at tho rniversity of Cahfornin, mi 
Peralta Arenue, Ikrldegll, Cali/ointa. 

Petrou, Nicholas V., M.Se, (Harvard Univ ) Kleclrirnl KiiKitiocr, Project Engineer, West- 
inglioiiac Kleclrie Coriioralion, I8U Anlmoie EkiL, Pilldnirgh^I, Penniylimia. 

Prabash, Aditya, iM,A, (Hniv. of iMichigan) .Studont, e/o .Mathematica Department, Hni- 
veraily of Michigan, Aim Arlair, Mieliigan. 

Read, Robert R., Il.iS, (Oregon State College) .Vppreatiee Engineer, Inventory and Conts 
Division, Pacific Telephone and Telegraph Company, 8M .Y./i',, 30, Portland, (hegon 

Selden, Esther, M,A. (Vilrin, ftiland) Research Assistant, Slntistieal UboraUiry, Univei- 
sity of California, ^IIO Ikihj Hired, lierkeky 5, Calijoniia. 

Sodano, John J., B S, (Queens College) Student, Matliminitieal iStatistieS, Columbia Uni- 
versily, iU-lB 93rd Asenut', JmaicaS, New York. 

Stilllnger, Richard C., M.S., (Ihiiv. of Michif^n) (Iracliiale ,Stud('nl, 1388 IFcslo/i Con//, 
Ififfois Hun, Michigan, 

Swan, Albert W., B.A.Sc, (Univ. of Toronto) Hlalislieid Heotion Iteseareh and Develop¬ 
ment Doparlmerit, The L'nitcd Steel Cornjiany Limited, e/o The United Steel Com- 
pamos Ltd , 17 Wcslbourne Road, ShelTield 10, Etiglaml. 

Tate, Robert F., A,B (Univ of Calif) Teaching Fellow, Department of Mathematical 
Statistics, Philips Hall, Chapel Hill, North Carolina. 

Teichroew, Dan, B,A. (Univ. of Toronto) Division of Research, Department of Lands and 
Forest, South Baymouth, Cnlario, Canada, 

Tyler, Leona E., Ph,I) (Univ, of Miimesotn) Associate Professor of Psychology, Depail- 
mentof Psychology, University of Oregon, Eugene, Oregon, 



ADOPTION OF THE NEW CONSTITUTION 

The chief order of busiooss at the business meeting of the Institute held at 
Madison, Wisconsin on September 10,1948, was the adoption of the new Consti¬ 
tution. The draft mailed to the members in August, 1948, was adopted unani¬ 
mously after two changes had been made. They were: (1) the insertion of the 
word “Article” before each of the respective articles and (2) the elimination of 
the hr.st “the” in the third line and fourth paragraph of Ai’ticie 4. 

Other business transacted at the meeting included a report of the Secretary- 
Treasurer on the financial condition of the Institute indicating that while the 
Institute is just operating within its income during 1948, steps will have to be 
taken to provide the additional revenue needed for 1949. It was decided not to 
raise dues for 1949 but to attempt to raise additional funds by; (1) an immediate 
apiieal to universities and other institutions which are sponsoring research in 
mathematical statistics for contributions to the Institute and (2) an appeal to 
the members of the Institute to make additional contributions at the time of the 
payment of their annual dues. 

Other matters under consideration at the meeting included a reading and dis¬ 
cussion of a proposed revision of the By Laws, the announcement of the dates 
and locations of future meetings of the Institute and the passing of a resolution 
of thanks to those contributing to the success of the Madison meeting. 

A copy of the official minutes of this meeting may be obtained on request from 
the Secretary-Treasurer, 

P. S. Dwyer 
Secretary- Treasurer 


607 



REPORT ON THE MADISON MEETING OE THE INSTITUTE 


The I’leventh Siiminpr McolinK of Uh; InHtitutc of MiithL'inu,fcical .SLiitihtics 
was held at tlie Univeivity of Wiseonsiti, Madison, Wisoonsin, Tuesday, Sep¬ 
tember 7 through Friday, Meplemher 10, ISMS. 'I'he meeting was held in con¬ 
junction with the RUinmer meetings of tlie American Mathematical Society, the 
Mathematical Association of America and the Kconometric Society. The follow¬ 
ing eighty members of the Institute attended the meeting: 

0. B. Allondocrfor, V. L. .\iiilerf!on, K. J. Arnold, II, M. Haam, A. S Harr, Walter lUrlky’ 
H. P. Beard, A. A. Bennett, T. A. HiclcorHtaff, J. H. Biiahny, Marm (.'ostellani, Uttam Clmnd’ 
Herman Chcrnoll, C, 0 Gniig,!. II. Curfiaa, O.B DaiiiziK, I), B. De Lury, J.L Doob.A M 
Dutton, P. S. Dwyor, Mra. Daisy Edwards, Churchill I’lisenhai't, II. P. Evans, G. 11, Pisehei, 
J. B. Freund, H. M. Cielmmn.II. II. Gcrmond, M. A. Girshiek, Gasper Goflman, P. It Ilalmos, 
W, G.Hart, E. 11. C. llildebrandt, WasHily lIoelldinK, D. t! llorvilz, Harold Hotelling, A, ,S, 
irouflehaldor,M. H. Ingrabam, Leo Ivat/,, Oscar Kcinplbonii’, J. F Kenney, W. M. Knieaid, 
T. G. Koopmans, H. D. Larsen, tValtec Leighton, H. U. Mann, A. M. Mark, Jacol) Marsehak, 
A. W. Marshall, Konuotli May, M. It. Mickey, ilr., Dorothy J. Morrow, C J. Nesbitt, M, ,J 
XetKorg, John von Neumann, Jerzy Noyman, (J. B. Prire, G, J. Hoes, J. B. Itbodos, P It. 
llidor, F. D. Itigby, Herman lluhiii, Arthur .Sard, Henry BeheUd, K 1). ,Seholl, I. E , Segal, 
G, ll. iSotli, W. n, Birapson, Andrew Bohezyk, K W. Btaey, G. ,M, Stein, A, (L Bwarisoii, 
Zonon Szatrowski, R, M. Thrall, A, W Tucker, J. W. Tiikey, W. A. Wallis, J. E, Walsh, 

J. B, Wilkins, Jr., H. S. Wilks, M. A. Woodbury. 

The Tuesday morning session was devoted to contributed papers. Professor 

K. J. Arnold of the University of Wisconsin presided. The attendance was 
approximately forty. The following papers were presented: 

1. On Dialrihulion-frec Confukneo Intervals, Prcliniimiry Report. 

Dr Wassily IloeUding, Institute of Statistics, Hniversily of Noilh Garolina. 

2. On Certain Statistics Jar Samples of S from a Normal Population. 

Mr Julius Liohloin, Statistical Engineering Hihoratitry, National Bureau of Btaiul- 
ards Presented by Di. Ohurehill EiBciiliart. 

3. On Multinomial Distributions with Limited Freedom: Stoehasltc Genesis of Pardo's 
and Pearson's Curves. 

Professor Maria Caslellani, Univoisity of iMvasas Gity, 

4 Filling Generalized Truncated Not mat Distributions 

Professor Harold Ilotglhng, Institute of Slatislies, University of North Garolina, 

5 On the Distribution of the Two Closest Observations Among a Set of Three Independent 
Observations. 

Professor G. 11. Seth, StaLislical Laboratory, Iowa State Cjollego, 

0. The Derivation of Certain liecurrence Formulae and their Application to the Extension 
of Existing Published Incomplete Bela Function Tables. 

Dr. T. A, Bancroft, Alabama Polylcchnio Institute. (Presented by title ) 

On Tuesday afternoon a session for contributed papers was held jointly with 
the American Mathematical Society. Professor P. S. Dwyer of the University 
of Michigan presided The attendance was approximately eighty. The follow¬ 
ing papers were presented: 


008 



REPORT ON MADISON MEETING 


609 


7 Asijnvplolio Sludenhzalion in Testing Hypothesis 

III' Horinan Choinofl, Cowles Commission, ITnivoisity of Chicago. 

S dompleieness, Similai licgions and Unbiased Eshmalion Preliminary Repoit 
Professor K. L Lohnmn, University of California and Piofessor Henry ShefF^, Uni¬ 
versity of California at L().s Angeles 
il fill a Proposed Method for Eshmahng Populations 
Professor C. C Ginig, Umvoisity ol Michigan 
1(1 Home liesulls on the Asymptotic DisinbulionoJMaximum- and Quasi-maximum-likeh- 
hood Estimates 

Dr Herman Huliiii, Institute for Advanced Study. 

11 The Probabilily Points of the DistiibuUon of the Median in Random Samplesfrom any 
Continuous Population. 

Dr. Churchill Eisenliai L, IjoIc S Deming and Ceha S Martin, Statistical Engineering 
Ijilniratory, National Bureau of Standards 

12 On the Arilhniclie Mean and the Median tn Small Samples from the Normal and Certain 
Non-normal Populations. 

Di Churehill Kisenhart, Lola S Deming and Celia S Martin, Statistical Engineering 
IjiUxiratory, National Bureau of Standards. 

IH The Rclahoe Frequencies with which Certain Eslimatoi s of the Standard Deviation of a 
Normal Population Tend to Underestimate Its Value 

Dr Cluirchill Eisonlmi t and Celia S Martin, Statistical Engineering Laboiatory, 
National Bureau of Standards 

11 Some Non-paramcli ic Tests of Whelhei the Largest Observations of a Set are too Large 
Preliminary lleport 

J)i J. E Walsh, Project Hand, Santa Monica, California 
LA On Some Bounded Significance Level Properties of the Equaltail Sign Test for the 


(Presented by title.) 


Mean. 

Dr. J E. Walsh, Project Rand, Santa Monica. Califoinia. 

Ill Infinilely Divisible Dislribulions, 

Professor Will Eellor, Cornell University (Presented by title.) 

n, Flucluation Theory of Recurieni Events 

Piofessor Will Feller, Cornell University (Presented by title ) 

IS. Formulae fur the Percentage Points of the Distributions of the Arithmetic Mean in 

' ilamlom Samples from CerlauiSymmtrical Universes , a, j 

Mr. Uttam Chaiid, University of North Caiobna and National Buieau of Slandards. 
(Preseiitecl by title.) 


Abstracts of the contributed papers appear elsewhere m this issue of the Annals. 
On Wednesday morning the Institute and the Econometric Society held a ]omt 
session on SiocJmstic Procesm with Professor Harold Hotelling of the ^™sity 
of North Carolina presiding. Attendance was approxmately ninety. 

Hotelling presented an Historical Summary of the Problem Pressor J^L Doob 
of the ‘University of Illinois presented a paper, Stochastic DiffereriMS qua to 
1 sih-e mfferenlial Equations. Professor Sf ^hmanyan ^ 
of the University of Chicago presented a paper, Brovmtan Motion, Dynamtcai 

Friction and Stellar Dynamics. ,,, t-i a nn Thurs- 

Th. tl.,cc joint sessions of the Institute Md the Eoonometo “L,,.., 

dny we devoted to s ™ sl“ wss 

reSrs^rof P-ofessor S S Wdks of Prinseton llniversdy. 



IlKI'ORT nx MADIBON MEETIXC 


I’rntar Jolm von Xtiumiuin of tho Institute for Atlviinrod Study predcnted a 
paper, Sumy oj ffe Thmry of Gmes, Professor Oskar Morgcnstem of Princeton 
University presented a paper, Emiomks and Ik Tkonj of Ganm. Dr. M, A, 
Gbhick of Project Rand presented a paper, SialiA and Ik Tkory of Gami 
The second morning session was under the chairmanship of Professor John von 
Xeumann of the Institute for Advanced Study. Dr. PI W, Paxson of Project 
Rand presented a paper, RomU Dmbpm&its. Professor J. W, Tiikey of Princc’ 
ton University prwntod a paper, A Prohlm in Slrakgy. Dr, G. B. Dantzig of 
the Array Air Forces presented a paper, Pr&yraminy in a Linear Slnwturo. The 
final action of the symposium was a round talde discussion with Professor John 
von Neumann of the Institute for Advanced Study as clmirman and with the 
following participants: Dr. G. B. Dantzig, Dr. M. A, Girshick, Professor Harold 
Hotelling, Professor Irving Kaplansky, Ih'ofcssor Samuel Karlin, Dr. J. C. C. 
MoKonsey, Professor Oskar Morgenstem, Dr. K. W. Paxson, Dr. L. S. Shapley, 
and Professor J. W. Tukey, 

A membership business meeting was held on Friday, September 10, in Bascora 
Hall at which twenty-one members were present. An account of the business 
transacted at this meeting may bo found elsewhere in this issue under tlic heading 
“Adoption of a New Constitution,” 

The final session was on Sequ&nUal Eslimalion and was held jointly with the 
Econometric Society on Friday morning with Professor Jerzy Neymen of the 
University of California presiding. Attendance was approximately fifty. Pro¬ 
fessor Charles Stein of the University of California pre,scnted a paper on Soquon- 
iid Estimaim Professor W, A. Wallis of the University of Chicago presented 
a discussion. 

Social affaim during the meeting included a tea Tuesday afternoon, a concert 
of tlie Pro Arte String Quartet Tuesday evening, a dinner Wednesday evening, 
a picnic Thumday afternoon, and a beer party Thursday evening. 

K. J. Ahnold 
Assistanl Secretary 








